|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Reports |
JM Beneciuk, PT, DPT, FAAOMPT, is currently enrolled in the Rehabilitation Sciences Doctoral Program (PhD), Department of Physical Therapy, University of Florida, PO Box 100154, Gainesville, FL 32610-0154 (USA).
MD Bishop, PT, PhD, is Assistant Professor, Department of Physical Therapy, University of Florida.
SZ George, PT, PhD, is Assistant Professor, Department of Physical Therapy, Brooks Center for Rehabilitation Studies, University of Florida. Mailing address: Department of Physical Therapy, University of Florida, PO Box 100154, Gainesville, FL 32610-0154 (USA).
Address all correspondence to Dr Beneciuk at beneciuk{at}phhp.ufl.edu
Address all correspondence to Dr George at: szgeorge{at}phhp.ufl.edu
Submitted August 7, 2008;
Accepted October 30, 2008
| Abstract |
|---|
Methods: Relevant databases were searched up to June 2008. Studies were included in this review if the explicit purpose was to develop a CPR for conditions commonly treated by physical therapists. Validated CPRs were excluded from this review. Study quality was independently determined by 3 reviewers using standard 18-item criteria for assessing the methodological quality of prognostic studies. Percentage of agreement was calculated for each criterion, and the intraclass correlation coefficient (ICC) was determined for overall quality scores.
Results: Ten studies met the inclusion criteria and were included in this review. Percentage of agreement for individual criteria ranged from 90% to 100%, and the ICC for the overall quality score was .73 (95% confidence interval=.27–.92). Criteria commonly not met were adequate description of inclusion or exclusion criteria, inclusion of an inception cohort, adequate follow-up, masked assessments, sufficient sample sizes, and assessments of potential psychosocial factors. Quality scores for individual studies ranged from 48.2% to 74.0%.
Discussion and Conclusion: Validation studies are rarely reported in the literature; therefore, CPRs derived from high-quality studies may have the best potential for use in clinical settings. Investigators planning future studies of physical therapy CPRs should consider including inception cohorts, using longer follow-up times, performing masked assessments, recruiting larger sample sizes, and incorporating psychological and psychosocial assessments.
| Introduction |
|---|
|
|
|---|
McGinn and colleagues3 recommended a 3-step process in the development and testing of a CPR. The first step involves the derivation of the CPR (derivation studies), and the second step involves the validation of the CPR (validation studies). The third step involves assessment of the impact of the rule on clinical behavior, also referred to as an "impact analysis." For CPRs for physical therapy interventions, steps 2 and 3 are not routinely performed. Although it has been suggested that a validated CPR can be applied in various settings with confidence in its accuracy,4 our impression is that most CPRs reported in the physical therapy literature are derivation studies. Furthermore, the lapse in time before validation occurs can be extensive. This situation presents clinicians with the dilemma of whether they should incorporate the results of a derivation study into their clinical practice.
Our opinion is that the quality of a derivation study is one factor that should be considered before a CPR is implemented into clinical practice. This interpretation can be difficult, however, because the quality of CPR derivation studies pertaining to interventions has not been reported. Assessing the quality of derivation studies has potential advantages for physical therapy practice and research. First, a quality assessment will assist clinicians in deciding whether a given CPR is appropriate for implementation into clinical practice. Second, a quality assessment will assist future researchers in the design of high-quality studies for the development of new CPRs.
Therefore, the purpose of this systematic review was to determine the quality of CPRs developed for interventions used by physical therapists. Studies were included in this review if the explicit purpose was to develop a CPR related to a specific intervention approach for conditions commonly treated by physical therapists. Previously validated CPRs were excluded from this review because there is less debate over the clinical application of validated CPRs2 and because methodological concerns about derivation studies are of less concern when a validation study has been reported.
| Method |
|---|
|
|
|---|
|
The original list of criteria7 was altered slightly by removing the criterion related to the rate of response of potential study participants because this item is not commonly reported in the physical therapy literature. Additionally, we added an important criterion by including masking of outcome assessors and treating clinicians.6,8 In our opinion, the resulting criteria are similar to those that have been suggested for evaluating the quality of prognostic studies for patients receiving physical therapy care8 and are consistent with the process of evaluating prognostic variables.10
The 18 criteria used to assess quality in this systematic review represented 8 categories: study population, response information, follow-up, intervention, outcome, masking, prognostic factors, and data presentation. A description of each criterion is provided in the Appendix. The criteria could be scored as positive, negative, or unclear. A positive score indicated that the criterion was identified in the study and met specific requirements consistent with a high-quality prognostic study. A negative score indicated that the criterion was identified in the study but did not meet specific requirements. A score of "unclear" meant that the study provided insufficient information regarding that criterion. To obtain a conservative estimate of quality, negative and unclear ratings were collapsed when study quality was rated. A total quality score was determined by adding positive scores, providing a potential high score of 18 (100%).
Ratings of individual studies to determine quality scores were independently assigned by the 3 reviewers before a meeting on the interpretation of the 18-item list of criteria (time 1). The meeting provided an opportunity for reviewers to assess agreement and discuss criteria that resulted in high disagreement. In addition, components for a given criterion that may have been overlooked by reviewers were clarified. As appropriate, a quality criterion was updated to reflect an updated interpretation. For example, there were differences in opinion about what constituted a prospective study (criterion E). During the meeting, an operational definition for a prospective study was approved. After the meeting, the 3 reviewers again independently assigned ratings (time 2), this time using the final guidelines provided in the Appendix.
Data Synthesis and Analysis
Statistical pooling of results was not performed because of the obvious heterogeneity among studies in populations used, interventions applied, and outcome measures administered. Reliability analyses were performed with SPSS 15.0 for Windows* and Excel.
Percentage of agreement was calculated for individual items. Negative and unclear ratings were collapsed into one variable so that ratings could be dichotomized into 2 categories. Interrater reliability was reported for the total quality score by use of the intraclass correlation coefficient (ICC [2,1]) and respective 95% confidence intervals (CI).11
To determine a single overall quality score for an individual study (Tab. 1), each of the 3 reviewers scores for a particular study were averaged to account for the possibility that interrater agreement was less than 100% after time 2. Therefore, the overall quality scores solely reflect the results after time 2 ratings. As in other reviews using this scoring system, high-quality studies were operationally defined as those that had average quality scores of greater than 60%.7
|
| Results |
|---|
|
|
|---|
The remaining 8 studies were included in this review. Two additional publications were included after a review of reference lists and related articles, resulting in the analysis of 10 publications in this review. Five studies involved CPRs for responses to spinal manipulation.29–33 The other studies predicted responses to lumbar stabilization,34 hip mobilization,35 patellar taping,36 multimodal interventions for cervical radiculopathy,37 and trigger point therapy for headache38 (Tab. 1).
Methodological Criteria
Percentage of agreement on ratings of individual items ranged from 70% to 100%; items B, E, H, K, L, and P had the lowest levels of agreement (70%–86.7%) (Tabs. 2 and 3). After a meeting on the interpretation of the 18-item list of criteria, percentage of agreement on ratings of individual items ranged from 90% to 100% (Tabs. 2 and 3). Individual items commonly rated as low quality (ie, not meeting the criteria in greater than 50% of the studies) were items A, B, F, K, M, and R. Among these items, the inclusion of an inception cohort (item A), description of inclusion and exclusion criteria (item B), and follow-up of
6 months (item F) were met in
10% of the studies. The results indicated that for 6 items, all 3 reviewers were in absolute agreement that a given criterion was met across all studies reviewed (eg, all 3 reviewers were in absolute agreement that item B was met in 3% of all studies reviewed) (Tab. 4).
|
|
|
=11.13; 61.8%) ranged from 8.67 to 13.33 (48.2%–74.0%). Five studies29,30,34,37,38 were rated at greater than 60% (range=61.1%–74.0%). Four studies31–33,36 were rated at 50% to 60% (range=53.7%–59.3%), and one study35 was rated at less than 50% (48.2%). | Discussion |
|---|
|
|
|---|
Studies that met this quality index included CPRs for determining factors associated with responses to stabilization exercises,34 responses to muscle trigger point therapy for tension-type headaches,38 the inability of patients with low back pain to show improvement with spinal manipulation,29 manipulation of the thoracic spine in patients diagnosed with mechanical neck pain,30 and a multimodal intervention approach for cervical radiculopathy.37 The lower-quality studies included CPRs for predicting favorable responses to cervical manipulation in patients with neck pain,31 the management of cervicogenic headache,32 patellar taping,36 lumbopelvic manipulation in patients with patellofemoral pain syndrome,33 and hip mobilization for knee pain indicative of osteoarthritis.35 Quality scores can assist clinicians in deciding whether to use these nonvalidated CPRs. However, quality scores are not a substitute for CPR validation studies. Validation studies provide more-definitive information for clinical applications because they are independent studies of new subjects and involve a variety of clinicians and patients.1,3,6,40
An important factor to consider for a methodologically sound CPR derivation is the risk-to-benefit ratio associated with its application. Risk was not empirically assessed in the CPRs considered in this review, a fact that is not surprising given the status of the physical therapy literature.41 Clinical prediction rules were originally developed in the medical profession for decisions involving higher associated risks, such as those associated with traumatic injuries.42–45 The risk associated with the interventions used in the CPRs considered in this review is believed to be minimal in comparison with the risk associated with emergency medicine. For example, consider the risk associated with the use of stabilization exercises for low back pain34 in comparison with the risk of not ordering radiographs for traumatic injuries.42–44 Failure to detect a fracture is associated with a risk higher than that associated with the use of stabilization exercises when those are not indicated. Unfortunately, the current physical therapy literature does not allow a quantitative consideration of the risk-to-benefit ratio, so clinical decisions must be based on qualitative factors.
Individual items that received low-quality ratings were similar to previously suggested areas of methodological concern for CPR studies.2,6,40 Specifically, masking of outcome assessors and treating clinicians (item K) did not occur in a majority of the studies reviewed. Masking of outcome assessors and treating clinicians is important for limiting the measurement bias of potential predictor variables.1,3,8 Additional areas of concern identified in this review included the use of an inception cohort and definition of the duration of symptoms in eligibility criteria. To limit potential error in establishing a prognosis, subjects should be enrolled in a common time frame with regard to their current condition.8 This criterion was not a component in a majority of the studies used to develop CPRs. Therefore, samples used to develop CPRs may lack homogeneity, thereby increasing the potential for bias in predictor variables and outcomes. Another area lacking in CPR derivation studies published to date was the follow-up period, which was suggested to be at least 6 months. Immediate effects were commonly reported; such immediate effects might be beneficial only in demonstrating evidence of responsiveness.8 Longer follow-up times are needed to demonstrate valid clinical implications for the use of a given intervention. Finally, an assessment of potential psychosocial prognostic factors (item M) was commonly not included. Psychological factors, such as kinesophobia, catastrophizing, anxiety, and depression might have strong influences on outcomes related to musculoskeletal conditions.46–49 Incorporating these factors into the development process has important clinical implications for future CPRs.
Limited sample sizes have been reported to be common methodological flaws in CPR studies.2 It has been suggested that 10 to 15 subjects are required for each prospective predictor variable in CPR studies.50 Not meeting this requirement may lead to inaccurate statistical results because of overfitting of regression models.50 It is important that our sample size determination was based on the final CPR model and not on initial prospective variables. The result was that only 40% of studies had an adequate sample size, and this was a liberal estimate. If we had elected to use initial prospective variables, then no studies would have met the criterion for sample size, a result suggesting that previously noted concerns about small sample sizes used in CPR derivation studies are legitimate. We suggest that future studies include larger sample sizes to account for derivation regression models in addition to the final, more parsimonious models.
Several limitations of this systematic review should be considered in the interpretation of the results. Although the results may be relevant to the decision-making process for implementing a CPR in practice, our findings should not be viewed as definitive. Our data provide complementary information on which CPRs to use in clinical practice, but the ultimate decision must be made in the context of a clinician's experience and factors specific to the encounter with a patient. These factors include, but are not limited to, whether patients seen in clinical practice are similar to those enrolled in the respective CPR study and whether a quantitative assessment of the risk-to-benefit ratio is available.
Another limitation is that the quality criteria used in this review were developed for prognostic studies, not specifically for CPR derivation studies involving interventions. Although these study designs are similar, there may be subtle differences with regard to quality determinations. The quality criteria used in this review did not include certain statistical elements that may have important implications for CPRs. For example, the criteria did not include the consideration of a quantitative risk-to-benefit analysis,41 reporting of potential predictor variable reliability,2,6 or reporting of CIs2,6 and effect sizes.51 Furthermore, the quality scores were equally weighted so that the "methodological importance" of a category was equally distributed among all of the criteria. This decision was made because we did not have clear evidence to follow for weighting decisions. Another relevant issue is that randomized designs have been suggested to be appropriate for CPR studies involving intervention selection.6,40 Although this may be true, it appears that the use of cohort studies is much more common in the physical therapy literature, because only one study included in this review used a randomized design.32 Therefore, future assessments of the quality of CPR derivation studies should include the development of a standardized rating system with a more-specific statistical criterion, consideration of weighting of quality scores on the basis of the methodological importance of particular categories, and the development of a criterion that is sensitive enough to determine the overall quality of a study design (such as distinguishing between cohort and randomized designs).
There was substantial agreement among the raters on individual items and overall quality scores; however, it is clear that agreement can be improved. Improvement can be accomplished by providing quality criteria more explicit than previously reported criteria, especially with regard to inception cohort, responders versus nonresponders, frequency of outcome measures, and sample size determination. The reliability estimates were also imprecise (large 95% CIs); we believe that this result may have been attributable to the relatively small number of studies included in this review. Additionally, we opted to collapse negative and unclear ratings, a strategy that may have influenced the percentage of agreement among the reviewers. However, this decision to collapse the data was made a priori and, even if the data had not been collapsed, the overall quality scores would not have been affected (Tab. 1). These scores considered only positive ratings because negative and unclear ratings were equally weighted as "0" when the overall quality scores were determined.7
| Conclusion |
|---|
|
|
|---|
| Appendix. |
|---|
|
|
|---|
|
| Footnotes |
|---|
Dr Beneciuk was supported by a National Institutes of Health T-32 Neural Plasticity Research Training Fellowship (grant T32HD043730). Dr Bishop and Dr George were supported by grant R21 AT002796 awarded to Dr George from the National Institutes of Health/National Center for Complementary and Alternative Medicine.
* SPSS Inc, 233 S Wacker Dr, Chicago, IL 60606. ![]()
Microsoft Corp, One Microsoft Way, Redmond, WA 98052-6399. ![]()
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
P. E. Mintken, J. A. Cleland, K. J. Carpenter, M. L. Bieniek, M. Keirns, and J. M. Whitman Some Factors Predict Successful Short-Term Outcomes in Individuals With Shoulder Pain Receiving Cervicothoracic Manipulation: A Single-Arm Trial Physical Therapy, January 1, 2010; 90(1): 26 - 42. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Maher PRISMA: Helping to Deliver Information That Physical Therapists Need Physical Therapy, September 1, 2009; 89(9): 870 - 872. [Full Text] [PDF] |
||||
![]() |
T. R Stanton, C. G Maher, and M. Hancock On "Clinical prediction rules for physical therapy interventions..." Beneciuk JM, et al. Phys Ther. 2009;89:114-124. Physical Therapy, April 1, 2009; 89(4): 394 - 394. [Full Text] [PDF] |
||||
![]() |
S. Z George, J. M Beneciuk, and M. D Bishop Author Response Physical Therapy, April 1, 2009; 89(4): 394 - 395. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |