|
|
||||||||
Research Reports |
DL Riddle, PT, PhD, is Associate Professor, Department of Physical Therapy, Medical College of Virginia Campus, Virginia Commonwealth University, 1200 E Broad St, Richmond, VA 23298-0224 (USA) (driddle{at}hsc.vcu.edu).
JK Freburger, PT, PhD, is NRSA Postdoctoral Research Fellow, Cecil G Sheps Center for Health Services Research, and Assistant Professor, Division of Physical Therapy, University of North Carolina at Chapel Hill, Chapel Hill, NC
Address all correspondence to Dr Riddle
Submitted May 2, 2001;
Accepted April 3, 2002
| Abstract |
|---|
Key Words: Kappa Measurement Reliability Sacroiliac joint
| Introduction |
|---|
|
|
|---|
Cibulka and colleagues11 defined SIJ region dysfunction as being present if at least 3 of the following 4 tests were positive: the standing flexion test, the prone knee flexion test, the supine long sitting test, and palpation of posterior superior iliac spine (PSIS) heights in a sitting position. Two therapists with an unspecified amount of training in the test procedures examined 26 patients with low back pain or buttock pain. Intertester agreement for determining the presence of SIJ region dysfunction was high (
=.88).
Cibulka and colleagues implied in articles published in 198811 and 199912 that tests were classified simply as positive or negative, regardless of whether the tests indicated dysfunction on the right or left side and regardless of the type of asymmetry present (ie, whether the tests indicated the possibility of an anteriorly or posteriorly rotated innominate). For example, the supine long sitting test could be graded as positive for any 1 of the following 4 conditions: right innominate posteriorly rotated, left innominate posteriorly rotated, right innominate anteriorly rotated, and left innominate anteriorly rotated. Therapists, therefore, may have agreed that 3 or more tests were positive without agreeing on the side involved or the type of asymmetry present. Cibulka and colleagues did not describe whether these types of disagreements were addressed and implied that tests were graded simply as positive or negative. We suspect this is the case because the manipulative intervention advocated by Cibulka and colleagues was designed for use regardless of the type of asymmetry that was present.13
Several authors11,1419 have suggested that examination and management of people with SIJ region dysfunction sometimes require identification of the involved side, type of asymmetry present, and correction of the asymmetry. For example, mobilization techniques designed to treat what is thought to be a posteriorly rotated innominate on the right side are different from those techniques designed to treat a suspected left posteriorly rotated innominate.20 It would appear to be important to know the degree of agreement, not only for judgments of the presence or absence of SIJ region dysfunction, but also for the type of asymmetry thought to be present.
The study of Cibulka and colleagues11 is especially important because their study provides the only evidence that suggests that assessments of innominate alignment or motion, when used in combination, have clinical utility. In our experience, tests requiring the assessment of innominate bone symmetry or movement are commonly done in practice. We believe that a study that is more generalizable than that of Cibulka et al would provide clinicians with additional information that could be used to determine appropriate examination strategies for SIJ region dysfunction. The purposes of our study were: (1) to replicate the study of Cibulka and colleagues11 on a larger group of patients and with a larger group of therapists and (2) to examine the degree of agreement between therapists by taking into account the side of the presumed dysfunction and the type of asymmetry present.
| Method |
|---|
|
|
|---|
|
Subjects
A total of 65 patients participated in the study. To be included in the study, patients had to: (1) be between 18 and 65 years of age, (2) be referred for treatment of a low back problem, (3) have unilateral or bilateral low back pain, (4) be a new patient or a patient who was currently receiving treatment for a low back problem, (5) have discomfort reported in the area of the buttock at the time of admission to the study, and (6) be able to reach at least the level of the patellae with their fingertips when flexing the lumbar spine while standing with the knees extended. This motion was necessary to complete 2 of the tests that were studied. The region of the buttock was defined as having the following boundaries: the iliac crest superiorly, the gluteal fold inferiorly, the sacral spinous processes medially, and the greater trochanter laterally. Pain also could be reported anywhere in the involved lower extremity. We admitted only patients with unilateral buttock pain so that therapists could describe their test results relative to the symptomatic side. Patients were excluded if they: (1) had lumbar surgery within the year prior to the study and (2) reported lower-extremity parasthesias or muscle weakness. Characteristics of the patients are presented in Table 2.
|
|
Data Analysis
Results of the 4 tests advocated by Cibulka and colleagues11,12 were combined, and if 3 of the 4 tests were positive, the patient was considered to have SIJ region dysfunction. The Figure illustrates the 3 approaches we used for examining composite scores from the 4 tests. First, the test results can be dichotomized and rated as positive or negative, independent of whether they indicate that the same impairment is present on the same side. This is the method that Cibulka et al11 appeared to use. For our first analysis, we collapsed all positive ratings (independent of the side and type of asymmetry determined to be present) and determined the extent of agreement when paired therapists rated 3 or more tests as positive or negative.
|
In our third analysis, we determined the extent of agreement for a 5-category scale (anteriorly rotated on the right side, anteriorly rotated on the left side, negative, posteriorly rotated on the right side, and posteriorly rotated on the left side). We chose this scale because 3 of the 4 tests we examined are used to determine whether an innominate was rotated relative to the other innominate (Tab. 3). Therefore, if 3 tests are positive, at least 2 of the 3 tests will be indicative of a rotated innominate. For the third analysis, therapists had to agree on the side involved (right, left, or none) and the type of asymmetry that was present (anteriorly or posteriorly rotated innominate) for at least 3 tests. For example, if a therapist concluded that the supine long sitting test indicated the presence of a posteriorly rotated innominate on the left side, the sitting PSIS test was positive on the left side (indicating the presence of a posteriorly rotated innominate on the left side), and the standing flexion test was positive on the left side (indicating a hypomobile left SIJ), the composite score was posteriorly rotated innominate on the left side.
Percentages of agreement and Cohen kappa statistic (
) coefficients were calculated for the individual tests and for the 3 composite test results. Because we suspected that the distribution of our data would be skewed, we also calculated the maximum kappa (
max) and kappa/kappa maximum (
/
max) values.21 The latter value indicates the proportion of agreement achieved by the therapists, taking into account the maximum kappa value possible.21 The maximum kappa value can be useful when the kappa value is low despite a high observed proportion of agreement.22 In our study, we suspected a proportionally large number of negative findings because 3 of the 4 tests needed to be positive to indicate a positive composite result. A large proportion of negative results would increase the likelihood of agreement by chance and subsequently reduce the kappa value.
We calculated the observed proportion of positive agreement (Ppos) and the observed proportion of negative agreement (Pneg).23 These indices indicate whether disagreements are more likely for positive or negative judgments, thus helping to resolve the paradoxical results of a high proportion of agreement but a low kappa.
Cicchetti and Feinstein23 provided several examples to illustrate how Ppos and Pneg can help clarify the meaning of a low kappa coefficient and a high percentage of agreement. In one example, the percentage of agreement for a set of dichotomous data was 85% and the kappa coefficient was .70. The corresponding Ppos was .84, and the Pneg was also high at .86. A second example had an identical percentage of agreement of 85%, but the kappa coefficient was .32. The corresponding Ppos was .91, but the Pneg was much lower at .40. One reason for the relatively low kappa coefficient in the second example was that the raters frequently disagreed on judgments of negative test results.
Typically, a generalized kappa statistic is used to describe the degree of agreement corrected for chance when many potential pairs of raters participate in the study, a scenario consistent with our study.24 We chose to calculate the Cohen kappa coefficients because we found no methods in the literature for calculating a maximum kappa coefficient from a generalized kappa coefficient. Cicchetti (personal communication, 2001) also suggested that the Cohen kappa statistic should be used when calculating Ppos and Pneg.
To determine whether the use of the Cohen kappa statistic in place of the generalized kappa statistic was appropriate, we calculated both a Cohen kappa coefficient and a generalized kappa coefficient for each of the 3 composite analyses. If these coefficients were essentially equal for each of the analyses, we believed it was acceptable to use the Cohen kappa statistic in place of the generalized kappa statistic. The 2 forms of kappa coefficients were identical for the first composite analysis (
=.18) and the second composite analysis (
=.11) and differed by .04 for the third composite analysis (Cohen
=.23, generalized
=.27). We therefore considered it appropriate to use the Cohen kappa statistic in place of the generalized kappa statistic for all analyses.
| Results |
|---|
|
|
|---|
|
|
| Discussion |
|---|
|
|
|---|
Reliability exists along a continuum from no agreement (eg,
=0) to perfect agreement (eg,
=1). Landis and Koch25 suggested that kappa values from .21 to .40 indicate "fair" agreement, an admittedly arbitrary label that does not take into account how a measurement is used and the consequences of a wrong decision. Although our data indicate that agreement for the individual tests exceeded that expected due to chance, we contend that reliability is too low for making treatment decisions on individual patients.
Many of the various interventions proposed for patients with SIJ region dysfunction typically require the therapist to identify the type of dysfunction present or the side of involvement.11,1620 We believe that therapists who use the 4 tests we examined to identify the type of dysfunction or the side of involvement are likely to deliver interventions to individuals who do not have a dysfunction or to deliver interventions incorrectly (either the proper technique will not be chosen or the intervention will be applied to the wrong side). In the latter case, the individual's problem, theoretically, could be exacerbated following the intervention. For example, if the cause of the individual's buttock pain is an anteriorly rotated innominate on the left, but the therapist determines that the individual's innominate is posteriorly rotated on the left, interventions to correct the posteriorly rotated innominate, theoretically, could exacerbate the problem.
More research is needed to guide clinicians on the choice of examination procedures and interventions for patients with pain that may be arising from the SIJ region. Until that research is done, alternative test procedures such as pain provocation tests would likely provide therapists with more reliable and, theoretically, more useful information than tests of SIJ alignment or movement.
We also found what we consider to be poor reliability for the composite results from the 4 tests classified as positive or negative. Our kappa coefficient for these dichotomized judgments was .18, and the kappa/kappa maximum value was 20.2%.
In contrast, Cibulka et al11 reported a kappa coefficient of .88. One factor that can lower the kappa coefficient is a low prevalence of the condition of interest. In our study, a relatively small number of patients had a composite score of positive, indicating the presence of SIJ region dysfunction based on Cibulka and colleagues' criteria. A total of 38% of all dichotomous composite judgments in our study were rated as positive. However, as can be seen by the kappa maximum, this relatively small percentage of positive test results was not the primary reason for the low kappa value. One likely explanation for the low kappa value was the very low Ppos of 49%. That is, when one therapist rated a composite score of positive, the other therapist rated the same patient as positive 49% of the time, a number essentially equal to chance. Reliability for the composite scores also appears to us to be too low for clinical use.
It is not clear why our results differed so dramatically from those of Cibulka et al.11 One potential explanation is that only 2 therapists participated in the study of Cibulka et al, and these therapists worked together and practiced the procedures prior to the study. The therapists also developed the approach. Cibulka et al did not describe the nature and quality of the therapists' training, so it is unclear how this training may have influenced reliability. The therapists in our study did not undergo extensive training. They were instructed to practice the procedures on each other and on patients until they felt ready to use the procedures on patients. The spectrum of patients was different between the study of Cibulka et al and our study. The majority of patients in the study of Cibulka et al reportedly had pain localized to the lumbar area. No patients reportedly had pain below the knee. Patients were admitted to our study only if they reported unilateral buttock pain, a symptom commonly associated with patients thought to have SIJ region dysfunction.5,6 In addition, approximately 20% of our patients reported pain below the knee, a complaint that is apparently not unusual in patients with SIJ region dysfunction.5 It is unclear what affect differences in patients' pain distribution may have had on the results of the 2 studies.
We believe our data are more generalizable than those of Cibulka et al.11 We had 34 therapists participating in our study, whereas Cibulka et al had 2 examiners. We examined 65 patients, whereas Cibulka et al studied 26 patients. Finally, we contend that most therapists who use these techniques likely apply the methods in ways that are similar to those used by the therapists in our study.
The general background and experience of the therapists who participated in our study was extensive (Tab. 2). They had a mean of 10.1 years (SD=6.6, range=128) of experience treating patients with low back pain, and they estimated that on average 11.6% (SD=10.0%, range=0% to 50%) of their caseload consisted of patients suspected of having dysfunction of the SIJ region. In addition, therapists reported attending a mean of 3.1 (SD=1.8, range=08) continuing education courses that were solely on the evaluation and treatment of the SIJ or that included a section on the SIJ. We believe it is likely that most therapists in our study had seen or had used the tests examined in the study because 3 of the 4 tests are commonly described in many textbooks and, in our experience, are commonly used in practice. However, we did not collect these data. It is our contention that these tests are well defined and that therapists with clinical experiences similar to those of the therapists in our study should be able to conduct these procedures reasonably well.
We examined our data to determine whether we could account for the large amount of error. We examined the pain intensity data to determine whether the patients' reported pain intensity varied between repeated tests. Pain that varies could result in the patient performing repeated tests differently and in therapists finding different results. We calculated an intraclass correlation coefficient (ICC [2,1])26 to describe the reliability of visual analog scale pain ratings taken by each therapist just prior to taking measurements on a patient. The ICC (2,1) was .97 (95% confidence interval=.95-.98). These data indicate that pain intensity did not vary appreciably between repeated tests and was not a source of error.
We also determined whether reliability differed for patients who were overweight. When patients are overweight, bony landmarks around the pelvis may be more difficult to palpate and could lead to additional error. A total of 31 of our patients had a body mass index (BMI) higher than 25, the criterion for grade 1 obesity.27 The kappa value for these patients was .21 (SE=.18) for composite judgments of positive or negative test results (composite test 1). The kappa value for patients who were not obese (BMI<25) was .14 (SE=.17). These data strongly suggest that being overweight was not a source of error in the study.
One limitation of our study was the inclusion of data from 4 patients who apparently did not report buttock pain prior to testing. In addition, data indicating pain distribution were missing for 7 patients. Pain distribution was important because therapists were instructed to interpret the supine long sitting test and the prone knee flexion test results relative to the painful side. We conducted an a posteriori analysis of patients with documented unilateral buttock pain (n=54) to determine whether the 11 subjects who did not have confirmed buttock pain influenced the results. The kappa values for the patients with confirmed buttock pain were .17 (SE=.13) for the first composite test, .11 (SE=.12) for the second composite test, and .27 (SE=.12) for third composite test. Reliability was not appreciably affected by inclusion of data from the 11 subjects who may not have had unilateral buttock pain (Tab. 5).
In reliability studies, researchers attempt, among other things, to reduce the error associated with measurements.23 We were unable to attribute the substantial error in our study to either the therapists or the patients. We believe the most likely source of error related to the nature of the phenomena these measures are designed to assess. The magnitude of rotatory movement in the SIJ is, on average, on the order of only a few degrees.2831 We contend that this small amount of movement combined with the inherent variability in size and shape of the innominate bone landmarks32,33 makes it highly unlikely that most therapists can make reliable judgments based on palpation of bony landmarks on the pelvis.
Although we question whether therapists can make reliable judgments given the variability in bony anatomy and the small amount of SIJ motion, the findings of Cibulka et al11 suggest that training may have contributed to the high reliability they reported. Unintentional therapist bias is also a possible explanation for their findings. In our study, we used multiple combinations of therapists. We contend that the use of many therapists may have decreased the potential effects of therapist bias on the results. Multiple combinations of paired therapists, however, also limit, to some degree, conclusions about intertester reliability. We conducted a multicenter study, and we randomly paired therapists at each clinic. For practical reasons, we did not examine all possible intertester combinations (ie, all therapists who participated in the study did not evaluate all patients). The results of our study may have differed had we conducted the study in this manner. We also controlled the order in which the 4 tests were conducted. Reliability may have differed with a different order of testing.
| Conclusion |
|---|
|
|
|---|
| Footnotes |
|---|
This study was approved by the Institutional Review Board of Virginia Commonwealth University.
This work was supported, in part, by a National Research Service Award Postdoctoral Traineeship from the Agency for Healthcare Research and Quality and sponsored by the Cecil G Sheps Center for Health Services Research Grant T32-HS00032.
* Participating clinics from the North American Orthopaedic Rehabilitation Research Network were Conroy Orthopaedic & Sports Physical Therapy, Flossmoor, Ill, Life Care Medical Center, Glassboro, NJ, Appalachian Physical Therapy Inc, Dahlonega, Ga, Physiotherapy on Bay, Toronto, Ontario, Canada, Pro Active Physiotherapy, Hamilton, Ontario, Canada, West End Physiotherapy Clinic, Hamilton, Ontario, Canada, Canadian Sport Rehabilitation Institute, Calgary, Alberta, Canada, Rehab Plus Associates, Midlothian, Va, Walser Physiotherapy, Thunder Bay, Ontario, Canada, Sooke Evergreen Physiotherapy, Sooke, British Columbia, Canada, and St Joseph's Hospital, Hamilton, Ontario, Canada. ![]()
| References |
|---|
|
|
|---|
Related Articles
This article has been cited by other articles:
![]() |
H T. Vaughn and W. Nitsch Ilial Anterior Rotation Hypermobility in a Female Collegiate Tennis Player Physical Therapy, December 1, 2008; 88(12): 1578 - 1590. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. C. Tong, O. G. Heyman, D. A. Lado, and M. M. Isser Interexaminer Reliability of Three Methods of Combining Test Results to Determine Side of Sacral Restriction, Sacral Base Position, and Innominate Bone Position J Am Osteopath Assoc, August 1, 2006; 106(8): 464 - 468. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. P. Cohen Sacroiliac Joint Pain: A Comprehensive Review of Anatomy, Diagnosis, and Treatment Anesth. Analg., November 1, 2005; 101(5): 1440 - 1453. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Z George and A. Delitto Clinical Examination Variables Discriminate Among Treatment-Based Classification Groups: A Study of Construct Validity in Patients With Acute Low Back Pain Physical Therapy, April 1, 2005; 85(4): 306 - 314. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Sim and C. C Wright The Kappa Statistic in Reliability Studies: Use, Interpretation, and Sample Size Requirements Physical Therapy, March 1, 2005; 85(3): 257 - 268. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Z George, J. E Bialosky, and J. M Fritz Physical Therapist Management of a Patient With Acute Low Back Pain and Elevated Fear-Avoidance Beliefs Physical Therapy, June 1, 2004; 84(6): 538 - 549. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |