PTJ
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


PHYS THER
Vol. 82, No. 5, May 2002, pp. 512-517

This Article
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Rapid Responses are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Riddle, D. L
Right arrow Articles by Keating, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Riddle, D. L
Right arrow Articles by Keating, J.
Related Collections
Right arrow Injuries and Conditions: Low Back
Right arrow Tests and Measurements
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Letters and Responses

Roland-Morris Scale Reliability


Letter to the Editor:

There is much more research that describes the measurement properties of evaluative measures such as the Roland-Morris (RM) scale1 today than there was a decade ago. The greater volume of studies provides more data that can be used to shape clinical decisions. This increased amount of research also increases the chance that the results of some studies, at times, may conflict with results of other studies. As the number of studies on a particular issue grows, the potential for conflicting results increases. The study of Davidson and Keating2 seems to be an illustration of this phenomenon.

Davidson and Keating2 examined the reliability and responsiveness of 5 functional status questionnaires designed for patients with low back pain (LBP). One of the scales examined was the RM scale, a questionnaire that has been studied extensively by our group and many others. Davidson and Keating found that the reliability of RM scale measurements was low, with an intraclass correlation coefficient (ICC [2,1]) of .53 (95% confidence interval [CI]=.29,.71) for a sample of 47 patients with LBP who reported that their LBP was "about the same," "a little better," or "a little worse." For a smaller subgroup that reported their LBP was "about the same," the ICC (2,1) was lower at .42 (95% CI=–;.07, .75). Based in part on these findings, the authors concluded that the RM scale "appeared to lack sufficient reliability and scale width for clinical application."2(p8)

In our opinion, these results are dramatically different from the large volume of evidence reported in the literature on the reliability of RM scale scores (Table).1,2,617 The evidence summarized in the Table was collected on diverse samples of patients from different countries with many different LBP diagnoses. Davidson and Keating2 attributed their findings, as compared with previous research, to a variety of "sample differences." For example, they suggested that, because their sample consisted of some patients who were self-referred for physical therapy, these patients added additional variability, leading to the low reliability. Several other researchers10,12,17 investigated the reliability of RM scale scores on samples that included self-referred and physician-referred patients. The ICCs reported in these studies varied from .79 to .88. Another potential explanation for the lower reliability could be related to the interval between assessments—6 weeks in the study of Davidson and Keating. Yet, other studies8,11 with reassessment intervals of equal or longer duration reported ICCs on the order of .66 to .86.


View this table:
[in this window]
[in a new window]
Table. Evidence Reported in the Literature on Reliability of Roland-Morris Scale Scoresa

 
A commonly accepted statistical concept is that the greater the number of studies (or statistical tests) conducted on a specific issue, the more likely an aberrant finding will occur simply by chance alone.18 We believe, given the large number of studies summarized in the Table, that the low reliability reported by Davidson and Keating2 for the RM scale was likely to be due to random variation associated with making a point estimate, such as an ICC. A point estimate is a single value, whereas a confidence interval represents a range of likely values. The upper bounds of the 95% CIs reported by Davidson and Keating were .71 or higher for the RM scale. These upper-bound estimates are more in line with point estimates from the literature (Table).

Why were the point estimates reported by Davidson and Keating2 for the RM scale so low relative to past research? Considering the relatively small sample size, especially for the subgroup self-classified as "about the same," we suspect that there were a few patients who had an unusual amount of variability in their scores. Large variability in a few subjects could lead to low reliability when sample sizes are small. The authors refer to a small group of subjects who had pain for greater than 6 months and who demonstrated "considerable variability" in RM scores despite reporting no change in their condition. Given the relatively small sample sizes in the 2 reliability analyses (n=16 and n=47), it would likely take only a few patients with unusual variability in their scores to skew the reliability data and produce point estimates that are atypical compared with the large amount of evidence that has already been published.

Given the extensive evidence that supports the reliability for RM scale scores,1,617 we disagree with the authors' recommendations that the RM scale should not be used as a measure of functional outcome in a general clinical population. Some clinicians may be tempted, based on the results reported by Davidson and Keating,2 to discontinue use of the RM scale or to consider other measures in lieu of the RM scale. We think this would be misguided when considering the evidence. We contend that the overwhelming majority of evidence supports use of the RM scale for routine clinical use or for research, and many experts agree with this view.35

Daniel L RiddlePT, PhD, Associate Professor and Paul W StratfordPT, MSc, Associate Professor

Department of Physical Therapy
School of Allied Health Professions
Medical College of Virginia Campus
Virginia Commonwealth University
Box 980224
Richmond, VA 23298
School of Rehabilitation Science
Associate Member
Department of Clinical Epidemiology and Biostatistics
McMaster University
Hamilton, Ontario, Canada

References

  1. Roland M, Morris R. A study of the natural history of back pain, part I: development of a reliable and sensitive measure of disability in low back pain. Spine.1983; 8:141–144.[Web of Science][Medline]
  2. Davidson M, Keating JL. A comparison of five low back disability questionnaires: reliability and responsiveness. Phys Ther.2002; 82:8–24.[Abstract/Free Full Text]
  3. Bombardier C. Outcome assessments in the evaluation and treatment of spinal disorders. Spine.2000; 25:3100–3103.[Web of Science][Medline]
  4. Kopec JA. Measuring functional outcomes in persons with back pain. Spine.2000; 25:3110–3114.[Web of Science][Medline]
  5. Deyo RA, Battié M, Beurskens AJH. Outcome measures for low back pain research: a proposal for standardized use. Spine.1998; 23:2003–2013.[Web of Science][Medline]
  6. Deyo RA. Comparative validity of the Sickness Impact Profile and shorter scales for functional assessment in low-back pain. Spine.1986; 9:951–954.
  7. Kopec JA, Esdaile JM, Abrahamowicz M, et al. The Quebec Back Pain Disability Scale: measurement properties. Spine.1995; 20:341–352.[Web of Science][Medline]
  8. Stratford PW, Finch E. Solomon P, et al. Using the Roland-Morris scale to make decisions about individual patients. Physiotherapy Canada.1996; 48:107–110.
  9. Nusbaum L, Natour J, Ferraz MB, Goldenberg J. Translation, adaptation and validation of the Roland-Morris questionnaire: Brazil Roland-Morris. Braz J Med Biol Res.2001; 34:203–210.[Web of Science][Medline]
  10. Stratford PW, Binkley JM. A comparison study of the Back Pain Functional Scale and Roland Morris Questionnaire. North American Orthopaedic Rehabilitation Research Network. J Rheumatol.2000; 27:1928–1936.[Web of Science][Medline]
  11. Patrick DL, Deyo RA, Atlas SJ, et al. Assessing health-related quality of life in patients with sciatica. Spine.1995; 20:1899–1908.[Web of Science][Medline]
  12. Johansson E, Lindberg P. Subacute and chronic low back pain: reliability and validity of a Swedish version of the Roland and Morris Disability Questionnaire. Scand J Rehabil Med.1998; 30:139–143.[Web of Science][Medline]
  13. Wiesinger GF, Nuhr M, Quittan M, et al. Cross-cultural adaptation of the Roland-Morris questionnaire for German-speaking patients with low back pain. Spine.1999; 24:1099–1103.[Web of Science][Medline]
  14. Underwood MR, Barnett AG, Vickers MR. Evaluation of two time-specific back pain outcome measures. Spine.1999; 24:1104–1112.[Web of Science][Medline]
  15. Jacob T, Baras M, Zeev A, Epstein L. Low back pain: reliability of a set of pain measurement tools. Arch Phys Med Rehabil.2001; 82:735–742.[Web of Science][Medline]
  16. Jensen MP, Strom SE, Turner JA, Romano JM. Validity of the Sickness impact Roland scale as a measure of dysfunction in chronic pain patients. Pain.1992; 50:157–162.[Web of Science][Medline]
  17. Stratford PW, Binkley JM, Riddle DL. Development and initial validation of the Back Pain Functional Scale. Spine.2000; 25:2095–2102.[Web of Science][Medline]
  18. Winer BJ. Statistical Principles in Experimental Design. New York, NY: McGraw-Hill;1962 :4–13.

 

Author Response:


We would like to thank Riddle and Stratford for raising a number of issues in response to our article. We found that measurements taken with the Roland Morris Questionnaire (RMQ) of subjects who were classified as unchanged were, as a proportion of the utilized scale, more variable than measurements taken using the Oswestry Disability Questionnaire, the Quebec Back Pain Disability Scale, the Physical Function Scale, and the Medical Outcomes Study 36-Item Short-Form Health Survey (SF-36). Riddle and Stratford asked why our correlation coefficients for repeated RMQ measurements were at the lower end of the range of reported values and whether a few patients with unusual variability in their scores were responsible for the results.

Correlation indices of reliability such as intraclass correlation coefficients (ICCs) indicate the error in measurements as a proportion of the total variance in scores.1 They are affected by sample variance (ie, the range of scores demonstrated by subjects) as well as inconsistency in measurements. In answering the question "Why are the ICCs lower?" we would like to examine the standard error of measurement (SEM), as we believe that expressions of error in the same scale units (in this case, RMQ units) provide a more useful basis of comparison than the correlation coefficients.

Table 1 shows the SEMs reported for, or that we have calculated from, a number of studies that have reported ICCs and Pearson correlation coefficients.213 The SEM provides an indication of the extent to which the average respondent varies (in RMQ units) when retested at a time when his or her condition could reasonably be considered to be unchanged.14 Table 1 shows that SEMs for measurements taken using the RMQ ranged from 1.5 to 4.1 RMQ units. This means that, on average, subjects who are assumed to be unchanged typically could be 1.5 to 4.1 RMQ units to either side of an obtained score. We believe it is likely that the amount of expected error in self-report measurements of activity limitation varies with time between test and retest. Our study and that of Patrick et al12 identified comparable estimates of error. Patrick et al were the only other researchers to report on error associated with retesting a cohort at greater than 6 weeks from the first test. Other researchers who have conducted retests 6 weeks or more after an initial test have pooled these data with data obtained for subjects retested much closer to the first test. The RMQ measurements may display increasing variability as the time between tests increases. Clinicians should consider these findings in the light of the time frames over which they typically monitor patient progress. In contrast, the other instruments used in our study yielded scores that were relatively more stable for the same subjects.


View this table:
[in this window]
[in a new window]
Table 1. Comparison of Standard Error of Measurement (SEM) Across Studies.

 
There are, we believe, several reasons for doing a reliability study. One reason is to quantify the magnitude of error in order to assess the suitability of the measurements for application to a specific task. Another reason is to determine whether measurement consistency can be improved. This might be achieved by identifying the source of error and refining the way measurements are taken in order to reduce this error. When the purpose of a reliability study is to determine which of several instruments provides the most consistent measurements, we believe a head-to-head comparison (ie, having the same subjects complete the questionnaires at the same time) is the best method because it ensures that the variability between measures is not due to sample differences but is related to questionnaire characteristics.

Riddle and Stratford make the reasonable suggestion that an explanation for our observation is that a few subjects exhibiting extreme test-retest differences in a small sample distorted the results. In our group of 47 unchanged subjects, RMQ change scores ranged from –9 to +19 points, and 13 subjects (28%) had change scores of 5 points (21% of the scale width) or more. The interesting question for us is why those subjects who exhibited large variations in RMQ change scores had more stable scores on the other questionnaires (Tab. 2).


View this table:
[in this window]
[in a new window]
Table 2. Comparison of Change Scoresa for Subjects Who Are Stable With Roland-Morris Questionnaire Raw Score Change of 9 or More

 
On the RMQ, subjects are asked to choose only the statements that apply to them "today." Back pain can vary considerably from day to day, whereas the overall rating of change in our study related to a 6-week period. However, the Waddell and Quebec questionnaires also refer to "today," so this in itself does not sufficiently explain why subjects were more variable in their responses to the RMQ in our study. Of respondents classified as unchanged, more than 30% changed their response to the statements "I sleep less well because of my back," "Because of my back, I go upstairs more slowly than usual," and "Because of my back, I try not to bend or kneel down." More than 20% of the respondents changed their minds on 15 out of the 24 RMQ items.

The size of the ICC does not tell us what items are more or less useful or whether the magnitude of error is acceptable for the intended use of the instrument. Close examination of patterns in the data, we believe, allows us to explore ways to refine instruments that we use to evaluate people with back pain. Publication bias, in our opinion, almost certainly confers an optimistic message about measurement utility. It is likely that some of the biases that result in underpublication of clinical trials with null findings15 also lead to underpublication of reliability studies with low reliability coefficient values. However, we are not aware of any studies that have explored the extent to which investigators fail to submit, or journals reject, such studies. The responsibility of researchers is to investigate and improve the instruments that we recommend for use in the examination of patients.

The incisive questions raised by Riddle and Stratford regarding our article are appreciated. Our data set nevertheless speaks for itself. We do not consider our findings or conclusions to be aberrant simply because they vary from previous findings. Most previous studies were based on different samples using shorter time frames for retesting. Indeed, the study by Patrick et al,12 in which longer retest periods than ours were used, produced findings that were similar to ours. Clinicians and researchers should weigh this evidence when considering examination instrument choice and should be prepared to change their choice of outcome measurement tools as better options present themselves.

Megan Davidson, Lecturer and Jennifer Keating, Senior Lecturer

School of Physiotherapy Faculty of Health Sciences
La Trobe University
Victoria 3086 Australia

School of Physiotherapy Faculty of Health Sciences
La Trobe University

References

  1. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull.1979; 86:420–428.[Web of Science][Medline]
  2. Jacob T, Baras M, Zeev A, Epstein L. Low back pain: reliability of a set of pain measurement tools. Arch Phys Med Rehabil.2001; 82:735–742.[Web of Science][Medline]
  3. Wiesinger GF, Nuhr M, Quittan M, et al. Cross-cultural adaptation of the Roland-Morris questionnaire for German-speaking patients with low back pain. Spine.1999; 24:1099–1103.[Web of Science][Medline]
  4. Johansson E, Lindberg P. Subacute and chronic low back pain: reliability and validity of a Swedish version of the Roland and Morris Disability Questionnaire. Scand J Rehabil Med.1998; 30:139–143.[Web of Science][Medline]
  5. Nusbaum L, Natour J, Ferraz MB, Goldenberg J. Translation, adaptation and validation of the Roland-Morris questionnaire: Brazil Roland-Morris. Braz J Med Biol Res.2001; 34:203–210.[Web of Science][Medline]
  6. Stratford PW, Finch E. Solomon P, et al. Using the Roland-Morris scale to make decisions about individual patients. Physiotherapy Canada.1996; 48:107–110.
  7. Kopec JA, Esdaile JM, Abrahamowicz M, et al. The Quebec Back Pain Disability Scale: measurement properties. Spine.1995; 20:341–352.[Web of Science][Medline]
  8. Roland M, Morris R. A study of the natural history of back pain, part I: development of a reliable and sensitive measure of disability in low back pain. Spine.1983; 8:141–144.[Web of Science][Medline]
  9. Stratford PW, Binkley JM, Riddle DL. Development and initial validation of the Back Pain Functional Scale. Spine.2000; 25:2095–2102.[Web of Science][Medline]
  10. Jensen MP, Strom SE, Turner JA, Romano JM. Validity of the Sickness Impact Profile Roland scale as a measure of dysfunction in chronic pain patients. Pain.1992; 50:157–162.[Web of Science][Medline]
  11. Stratford PW, Binkley JM. A comparison of the Back Pain Functional Scale and Roland Morris Questionnaire. North American Orthopaedic Rehabilitation Research Network. J Rheumatol.2000; 8:1928–1936.
  12. Patrick DL, Deyo RA, Atlas SJ, et al. Assessing health-related quality of life in patients with sciatica. Spine.1995; 20:1899–1908.[Web of Science][Medline]
  13. Davidson M, Keating JL. A comparison of five low back disability questionnaires: reliability and responsiveness. Phys Ther.2002; 82:8–24.[Abstract/Free Full Text]
  14. Keating J, Matyas T. Unreliable inferences from reliable measurements. Australian Journal of Physiotherapy.1998; 44:5–10.[Medline]
  15. Song F, Eastwood AJ, Gilbody S, et al. Publication and related biases. Health Technol Assess.2000; 4:10.

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
Occup. Environ. Med.Home page
I A Steenstra, J R Anema, P M Bongers, H C W de Vet, D L Knol, and W van Mechelen
The effectiveness of graded activity for low back pain in occupational healthcare
Occup. Environ. Med., November 1, 2006; 63(11): 718 - 725.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Rapid Responses are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Riddle, D. L
Right arrow Articles by Keating, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Riddle, D. L
Right arrow Articles by Keating, J.
Related Collections
Right arrow Injuries and Conditions: Low Back
Right arrow Tests and Measurements
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2002 by the American Physical Therapy Association.