To the editor:
The authors are to be complimented for a strong paper describing a methodologically complex process in an understandable manner. Although others have applied computerized adaptive testing (CAT) applications in outpatient rehabilitation for several years,1 Jette et al2 are the first to publish results of a practical application of a CAT in a peer-reviewed journal.
The strength of the work by Jette et al lies in the process used to develop the product. Item response theory (IRT) methods and CAT applications have the potential to be the foundation of outcomes measurement development in rehabilitation just as they were in educational measurement.3 We should not forget that IRT and CAT are not new; they are just new to rehabilitation and medicine. Jette et al discuss in the current paper and in earlier work how these methods can be used to develop a new outcomes scale, assess the strengths and weaknesses of the scale, and discuss how a scale can be improved when scale deficits are identified via practical application in busy clinics. These methods are sorely needed for many common paper-and-pencil instruments that are so popular in rehabilitation.
The study is not without limitations, many of which are detailed nicely by Jette et al. One psychometric issue not discussed relates to differential item functioning (DIF).4 Differential item functioning occurs when patients from different groups—for example, patients with hip versus knee impairments—have different probabilities of endorsing item response categories. In clinical terms, that means patients with knee impairments perceive the act of squatting as more difficult compared with patients who have hip impairments, which is clinically logical and important.5 Differential item functioning is common in patients treated in outpatient rehabilitation.5,6 When DIF is present and of practical importance, the lack of control for DIF can erode the validity of the outcomes measure.3
One of the strengths of IRT techniques is the ability to detect and possibly control for DIF. However, when DIF is identified but of no practical importance, DIF can be ignored when calibrating items.7,8 Discussion of DIF at least by body part treated would have strengthened the Jette et al paper, particularly because differences in item calibrations by body part treated6 have been published for the physical functioning items of the SF-36,9 which appear to be included in the AM-PAC-CAT item bank.10 From previous evidence,5,6 it would not be unexpected to find DIF between patients with hip, knee, or foot/ankle impairments, between patients with shoulder compared with elbow/wrist/hand impairments, and between patients with lumbar compared with cervical impairments for body mobility and activity item banks.
The authors rely on earlier factor analytic work10 that identified the mobility and daily activity constructs. Although the conceptual foundation identifying these 2 factors appears sound, there is evidence that these factors might not be distinctly separate. In the original sample that did not contain patients treated in outpatient facilities,10 factor loadings supported grouping items into the mobility and activity factors. However, in the current study of outpatients, using the CATs developed from a sample that may not have included outpatients provides some evidence supporting the need for further unidimensionality testing. Specifically, in the current study, mobility measures were most responsive for patients with lower-extremity impairments compared with patients with spine or upper-extremity impairments. The lowest effect size for both scales and all impairments was for patients with upper-extremity impairments using the mobility scale. However, the greatest effect size for the activity scale was recorded for the patients with upper-extremity impairments.
These results are clinically logical, given the items and sample. However, do the activity and mobility items really describe different constructs for patients treated in outpatient clinics? Could the mobility and activity items be combined into one item bank that is “essentially unidimensional,” where one dimension is dominant, possibly in the presence of one or more minor dimensions,11 without erosion of the scale psychometrics? Do patients' impairments demand different scales in order to assess the most appropriate construct of interest to the patient, that is, mobility for lower-extremity impairments versus activity for upper-extremity impairments? Do more difficult items (assessed using item calibrations) describe a separate construct compared with easier items, regardless of construct (mobility versus activity)?
If payers were to reimburse outpatient therapy services for value (unit of functional improvement per dollar cost),12 which construct should be used, that is, should we assess mobility for patients with lower-extremity impairments, and activity for patients with upper-extremity impairments? Which construct is more important for patients with cervical or lumbar impairments? The Jette et al results combined with the results of Hart and colleagues5,7,8 suggest the need for further assessment of item unidimensionality in patients receiving outpatient therapy. In addition, given that CATs are continuously evolving, how do developers, journal editors, and users keep current with pertinent CAT changes?
The results describing the responsiveness and construct validity of the AM-PAC-CAT measures support previous work using CATs applied in outpatient rehabilitation. For example, the effect size for prospectively collected data using body part–specific CATs on average was 0.92 in an earlier study,1 which is similar to the highest effect size for patients with lower-extremity impairment in the Jette et al study. However, Stratford and Riddle13 recommend using an external standard to assess sensitivity to change in a sample of patients who are likely to change at different rates. Such analyses are recommended for future AM-PAC-CAT investigations. Furthermore, construct validity results using CATs in the Hart and Connolly1 report are similar to the results reported by Jette et al. Taken together, results support that CAT administrations produce responsive and valid estimates of ICF activity measures in patients receiving outpatient therapy.
Jette et al describe in detail the content balancing performed by their CAT. However, is content balancing, which was developed for educational tests, as important in outpatient rehabilitation as it is in educational testing? The answer may be “probably.” Given that the primary advantage of CAT applications is reduced respondent burden without erosion of measure precision and validity, the answer may be that providers should take advantage of the efficiency of CATs and collect more data. In this way, providers in busy clinics can assess multiple constructs of interest efficiently, such as mobility, activity, and fear-avoidance.14 CATs should save clinicians time assessing multiple constructs.
Finally, the collaboration of good researchers and a proprietary database management company (eg, CRE Care LLC) facilitated the implementation of the current study. The integration of good science, electronic application of psychometrically sound outcomes instruments in busy clinics, and a journal's need to publish scientifically sound material produced a result that may affect clinical practice positively. As payrs progress toward new methods of payment that may include value-based purchasing, 2,12,15,16 proprietary database management companies may become more important, as they manage large databases of scientifically sound outcomes measures without undo political pressures. Jette et al and the editors of PTJ have taken the “high-road” by publishing this paper, and the readers will be the benefactors.
Thank you for the opportunity to contribute to this important discussion.
This letter was posted as a Rapid Response on February 22, 2007, at www.ptjournal.org.
- Physical Therapy