|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Perspectives |
L Resnik, PT, PhD, OCS, is Research Health Scientist, Providence VA Medical Center, and Assistant Professor, Department of Community Health, Brown University, 2 Stimson Ave, Providence, RI 02912 (USA)
D Liu, PhD, is Assistant Professor, Department of Community Health, Brown University
DL Hart, PT, PhD, is Director of Consulting and Research, Focus On Therapeutic Outcomes Inc, White Stone, Virginia
V Mor, PhD, is Professor and Chairperson, Department of Community Health, Brown University
Address all correspondence to Dr Resnik at: Linda_Resnik{at}Brown.edu
Submitted November 1, 2007;
Accepted June 9, 2008
| Abstract |
|---|
| Introduction |
|---|
|
|
|---|
Several health care quality indicators are based on measurement of a patient's performance or self-reported performance of functional tasks. For example, several of the nursing home quality indicators used by the Centers for Medicare and Medicaid Services are based on performance measurement data available in the Minimum Data Set assessments.6 As another example, the Health Outcomes Survey conducted by the Centers for Medicare and Medicaid Services uses the Medical Outcomes Study 36-Item Short-Form Health Survey (SF-36), a patient self-assessment tool, to assess health outcomes of Medicare beneficiaries in managed care settings and to measure performance of health plans.7
Provider profiles can provide practitioners with meaningful performance information, which can be used to help improve service quality and efficiency. Provider profiles can be used by purchasers to stimulate providers to reduce undesirable variation in services, control costs, and ensure that patients and payers are getting the best value for their money.1 Provider profiles, when publicly reported, also can aid consumer and purchaser decision making.12
Many strategies to evaluate quality of care or profile provider performance rely on analysis of observational data collected in administrative or outcomes databases. The advantage of using observational research is that many such studies use pre-existing data sets, making their implementation much more cost-effective compared to implementing an experimental design. Furthermore, studies using observational data, collected prospectively or analyzed retrospectively, can examine questions that are not feasible or ethical to ask using other research designs.
Although the use of observational study designs may be the most feasible approach to evaluating quality and comparing provider performance, these designs have limitations and require analytical approaches to strengthen the internal validity of the study. Threats to internal validity in these study designs include potential confounding and patient selection bias caused by nonrandom assignment of patients to providers,13 lack of independence of patients within providers,14 and missing data.
Many physical therapy providers are beginning programs to collect and interpret their own outcomes,15,16 and there is a strong likelihood that payers will expand efforts to redesign reimbursement policy based on provider performance.2 Thus, it is critically important for those involved in these endeavors, such as clinicians, researchers, administrators, and policy makers, to appreciate some of the underlying challenges and methodological issues related to outcomes assessment. Thus, the purposes of this perspective article are: (1) to discuss the advantages and limitations of using observational research to measure health care quality and evaluate provider performance and (2) to inform clinicians, researchers, administrators, and policy makers who want to use evidence to guide practice and policy or critically appraise observational studies. In this perspective article, we discuss key limitations of observational research designs and highlight statistical methods to address these limitations. We concentrate on managing potential confounding by patient demographic variables, patient selection bias, and missing data. Finally, we use an example from a recent research study comparing physical therapy clinic performance in terms of patient outcomes and service utilization with and without the use of these methods.
| Potential Confounding |
|---|
|
|
|---|
Confounding occurs if there are extraneous, or uncontrolled, variables associated with both the independent and dependent variables. A confounding variable, or confounder, can cause, prevent, or modify the outcome of interest by producing an association between independent and dependent variables when none exists or can mask a relationship that does exist. Observational studies compare pre-existing groups that may differ not only on the interventions they receive but also on many factors that could influence response to treatment. Thus, if an observational study shows a difference between 2 pre-existing groups, that difference may be due to the fact that they received different types of interventions, or it may be due to the differences in the characteristics of the groups.
Therefore, in an observational design, the researcher needs to establish group comparability through control of confounders during the data analysis phase to minimize potential confounding.18 Observational research designs should not use crude (ie, unadjusted) estimates of relationships between independent and dependent variables because crude estimates do not account for potential confounders. Instead, observational research designs should use statistical methods to make the groups more mathematically equivalent on baseline characteristics that may influence response to treatment. These methods often include variables that are potential confounders as covariates (ie, control variables that are added as independent variables) in the analyses.
In contrast, in a randomized controlled trial, patients are randomly assigned to treatment and control groups. Randomized controlled trials are the gold standard of research designs and provide the highest level of evidence on the effect of an intervention because the randomization process, if successful, creates comparability of patient characteristics as patients are randomly assigned to intervention and control groups during the design phase of study. If randomization works, the groups formed by random assignment differ only on the type of intervention they receive.
In observational studies, where patients are not randomly assigned to specific groups, we can attempt to control the influence of confounders by using statistical adjustment of patient characteristics.17,19 Because differences in the complexity and diversity of patients, also called the "case mix," vary across clinicians and institutions and differences in patient outcomes are associated with type and severity of impairments as well as other factors, multivariable statistical analyses are used. For example, improvement in patients undergoing rehabilitation can be attributed to many variables, including—but not limited to—age, symptom acuity, surgical history, and comorbid conditions.20–22 Clearly, these risk variables, must be controlled before a meaningful interpretation of outcomes of treatment can be made.23 Iezzoni23 has written extensively about risk-adjustment methods for rehabilitation because risk-adjustment methods help explain variation in outcomes and service utilization related to patient factors so that the remaining differences in outcomes can be attributed to the care delivered. Without appropriate adjustment, outcomes cannot be interpreted in a meaningful manner or attributed to therapeutic interventions.16,22–25 Some authors24 have shown that comparisons between health care providers have been compromised by inadequate assessment of case mix.
Risk-adjustment methods typically use statistical models. Statistical models provide a way to evaluate the relationship between a dependent variable and one or more independent, or explanatory, variables. Using linear modeling, the dependent variable is expressed as a linear function of the explanatory variables and a random error term, which represents the measurement error or unaccounted random noise in the model. A regression coefficient corresponding to an independent variable in the model represents the rate of change of the dependent variable as a function of change in an independent variable. The works of Jette and colleagues,21,25,26 Freburger,27 and Horn et al,28 among others, provide examples of observational research in the physical therapy literature that have used statistical modeling as a risk-adjustment technique for physical therapy data.
There are a number of other techniques that can be used to control for group differences in observational studies. These techniques involve creating comparable groups through matching of subjects. Matching techniques may be relatively simple, such as that used by Hart and Dobrzykowski,29 who matched each member of the comparison group to a member of the group treated by orthopedic clinical specialists based on basic demographic characteristics. Matching also can involve more sophisticated approaches, such as the use of propensity scores to create comparison groups of individuals with an equal probability of being in the intervention group or in the control group.30
Regardless of best efforts to adjust for differences in case mix, the most serious limitation of observational research is that researcher can control only for potential confounders that have been measured. There also may be unmeasured confounders over which the providers have little or no control, such as motivation- or socioeconomic status-related adherence (eg, having the money to purchase the needed equipment or get transportation or time off work to attend therapy). When observational data are analyzed retrospectively, there may be little control over the choice of data elements, and thus important, unmeasured covariates may not have been measured. For example, most outcome data sets used for physical therapy do not include information that would allow classification of patients according to movement signs and symptoms or specific diagnosis thought to affect outcome.31 While patient classification is one example of a potential confounder, no risk-adjustment approach can control for every factor affecting outcomes of care.14 The reality is that there are always unobserved or unmeasured confounders.
| Selection Bias |
|---|
|
|
|---|
Furthermore, patients who go to the same clinic likely share certain characteristics. This phenomenon is called "clustering." As examples, patients might cluster in clinics because they share similar primary sources of referral (eg, a specific group of primary care physicians, surgeons, neurologists, or other specialists), or clustering may be related to the location of a clinic within a similar geographic area consistent with people who share specific demographic characteristics, such as age, ethnicity, or socioeconomic status.
In addition, patients seen by the same therapist or provider may be clustered. Like patient clustering in clinics, therapist-level clustering in rehabilitation can occur for a variety of reasons. In some facilities, patients are assigned to certain therapists based on their clinical expertise, resulting in groups of patients who have similar characteristics (eg, patients with chronic pain, patients who are pregnant) within a therapist's practice. Provider-level clustering also might occur in rehabilitation because of therapist behavior, which often is individualized and consistent across patients and may or may not be similar to that of other therapists within the same clinics. Thus, variation in patient outcomes and service utilization might result from individual therapist practice patterns and behaviors. If any of these factors were associated with outcomes, the association could affect provider outcomes.
Furthermore, outcomes from clustered patients generally are not independently distributed. Failure to account for this dependence will result in biased estimates of the regression coefficients and their standard errors and may lead to incorrect conclusions about statistical significance. Greenfield et al19 demonstrated that comparisons of physician groups may be inaccurate if adjustment for both patient case mix and provider-level clustering is not performed before comparing quality measures of physician groups. Thus, statistical analyses of patient outcomes and service utilization by providers should control for patient clustering within providers. Failure to account for clustering generally will cause standard errors of regression coefficients to be underestimated.32
Control for clustering is relatively new to observational research in the physical therapy literature. However, Jewell and Riddle33 recently used a robust variance estimation method in their multiple logistic regression models to predict clinically meaningful improvement for patients with sciatica. Robust statistical methods are methods that are not unduly affected by departures from statistical assumptions, such as lack of independence of observations.
Another approach to control for clustering, especially when a data set has a hierarchical structure, is to use hierarchical linear models (or multilevel statistical models) to separate the within- and between-cluster variability and estimate cluster-specific parameters.14,34,35 Hierarchical linear models can incorporate the individual, patient-level information and attend to the dependency to higher-level groupings (ie, clustering). Resnik et al36 used hierarchical linear models, similar to those described in detail in the case example below, in research on the impact of state regulation of physical therapy services.
| Missing Data |
|---|
|
|
|---|
Besides the above ad hoc methods, there are a variety of other analytical approaches that can be used to handle missing data. These approaches include methods of imputing values based on the results of regression, multiple imputation, inverse probability weighting, and likelihood-based methods.39 Inverse probability weighting is the approach demonstrated in the research example we present below. It involves giving different weights to subjects depending on their likelihood of being included in the sample of complete data, which is analogous to using survey weights, where subjects more likely be selected into the study are given less weight in the analysis. Although detailed discussion of each of the techniques to handle missing data is beyond the scope of this article, it is critical that researchers evaluate the impact of missing data in their analysis and, if necessary, take steps to minimize bias related to missing data or to acknowledge that, in some cases, excessive missing data may fundamentally invalidate a study's findings.
| Application of Statistical Methods to Enhance Validity in an Observational Study |
|---|
|
|
|---|
Data Source
We conducted a secondary analysis of previously collected data from the Focus On Therapeutic Outcomes, Inc. (FOTO) database.* Our analytical sample was selected from a larger data set of clinics (N=358) treating patients with a variety of syndromes that participated in the FOTO database in 2000–2001. Clinics were eligible for inclusion in our study if: there was at least one physical therapist on staff, clinician and facility registration forms had been completed, the clinic had entered intake data for at least 40 patients with a low back pain syndrome (LBPS), and follow-up data were available on at least two thirds of the clinic's patients with LBPS. Our final study sample consisted of 114 outpatient clinics with 1,058 therapists who treated 16,281 patients with LBPS.
Outcome Measurement: Overall Health Status Measure Scores at Discharge
The patient outcome measure we used was the FOTO overall health status measure (OHS), a health-related measure of quality of life derived from the SF-36 that assesses both mental and physical dimensions of health. Internal consistency of items in the OHS constructs with 2 or more items has been reported (
=.57–.91).41,42 Test-retest reliability of data obtained with the OHS was good (intraclass correlation coefficient [2,1]=.92).42 Responsiveness in the treatment of patients with LBPS (effect size=0.83) and the validity of the OHS measure to discriminate expert therapists from average therapists have been reported.43
Methods for Risk Adjustment and Control for Selection Bias
We used statistical risk-adjustment, also known as case-mix–adjustment, techniques to control effects of confounding variables seen in patient populations.23,44 Because we expected that improvement in patients undergoing rehabilitation could be confounded by many variables, including—but not limited to—patient demographic and financial variables,36,43,45 we considered the following patient-level variables in our analyses (Tab. 1).
|
At level 1, the patients discharge OHS scores were assumed to be normally distributed and were modeled as a linear function of patient-level covariates (ie, the variables were considered potential confounders). Thus, all measured potential confounders were added into the model as covariates at level 1. The regression coefficients at this level are assumed to be directly related to the therapists at the next level. Because we did not include any therapist-level covariates, the second-level model was an intercept term, which represents the mean therapist effect within a clinic plus a random error term representing the therapist-specific effect within that clinic. The mean therapist effect within each clinic is assumed to be related to characteristics at the clinic level. This technique enables an estimate of the effect of each therapist separately. Because selection of the clinic or reason why a patient receives treatment in one specific clinic and not another clinic might be related to expected outcomes, we added 2 sets of variables at the clinic level: volume of new patients per month and variables representing the proportion of patients referred by physician type (see below for description of variables) in a further attempt to control for selection bias.46 Thus, at level 3, these therapist-specific intercept terms were modeled using the clinic-level covariates and a random clinic error term. Table 1 shows all of the variables in the multilevel model.
Methods to Address Bias From Missing Data
Because approximately 34% of follow-up data were missing, we assumed the missing data were missing at random47 and used the previously mentioned technique of inverse probability weighting to control for patient selection bias due to missing follow-up data.48 Inverse probability weighting was accomplished by performing a 2-step procedure. In step 1, we fit a logistic regression model where the dependent variable took the value of 1 if the observation was complete and the value of 0 if missing and where all patient baseline variables were included as covariates. In step 2, we used the inverse of the predicted probabilities of being complete as weights for the patient data.48 Thus, patients who, based on their data, were unlikely to have complete data were given more weight in estimating the effect model than were those who were likely to have complete data.
Classifying Clinics Into Performance Groups: Profiling
To determine the mean patient outcomes per clinic, we aggregated the patient residual scores within each clinic after fitting the 3-level model to form a clinic-specific residual score for patient outcomes. Residual scores are the difference between actual discharge scores and the predicted scores after modeling. The rationale for using residual scores to estimate provider performance is that residuals represent the amount of variance unexplained by the models that could potentially be explained by factors other than patient characteristics, including type of treatment given or some other aspect of service delivery. Thus, we consider residual scores to be "risk-adjusted" outcomes. The use of residuals to estimate provider performance is a method described by experts on provider profiling and risk adjustment and has been used in previous studies of nursing homes, hospitals, and physical therapists that classified expert and average physical therapists.43,45,49,50
We used the clinic-specific aggregated residual score after modeling of OHS score to classify clinics into 3 effectiveness groups, which were determined based on the ranking of the residual scores. We considered the upper 76th–100th percentile of residual scores to denote the "best" effectiveness group, the 26th–75th percentile to be the middle group, and the 1st–25th percentile to be worst group. Thus, those patients whose observed scores exceeded expectations had more positive scores and were in the upper percentiles, whereas those patients whose scores did not meet expectations had more negative scores and were in the lower percentiles. The mean, standard deviation, and range of residual scores for each group are shown in Table 2.
|
|
Patient-level factors that were associated with more visits (results available upon request) included: female sex; greater age; higher OHS intake scores; diagnoses of herniated disk, pain, deformity, or "other"; onset of condition other than acute; greater number of surgeries; a history of not exercising regularly; and any employment status other than working full-time or modified duty. Patient-level factors associated with fewer visits (ie, more efficient utilization) included payer source of Medicare, self-pay, workers compensation, and health maintenance organization or preferred provider organization.
Clinic-level variables associated with discharge OHS were proportions of referrals from primary care, orthopedic surgery, and neurologists. No clinic-level variables were associated with number of visits per treatment episode.
Impact of Applying Statistical Methods to Enhance Validity
Unadjusted OHS discharge scores were 71.9 (SD=18.6) for the best clinic group, 68.2 (SD=19.4) for the middle clinic group, and 63.1 (SD=20.9) for the worst clinic group. The difference between unadjusted mean scores of the highest and lowest quartiles of clinics was 8.8 points. On average, patients in the upper quartile improved almost 20 OHS points during therapy, and patients in the lower quartile improved an average of 11.2 OHS points. The predicted OHS scores (after modeling) for each of the groups are shown in Table 4. The data show that outcomes within clinics were largely what would have been expected, given their case mix. The models predicted that there would be a mean difference of 5.8 points between the highest and lowest quartiles of clinics. The mean difference in residual scores (ie, difference between observed [unadjusted] and predicted values) was only 3 points among groups beyond what was expected from the case-mix adjustment.
|
| Discussion |
|---|
|
|
|---|
We demonstrated 2 methods to control for selection bias. First, we used a HLM to account for clustering of patients by therapists and clinics. Second, we used 2 clinic-level variables to control for what we hypothesized would be related to patient selection volume (ie, number of new patients in the clinic as a measure of reputation and proportion of patients referred by each physician type). Using these variables, we demonstrated that patients treated in clinics that have a higher proportion of referrals from primary care physicians and orthopedic surgeons tend to report superior discharge outcomes compared with patients treated in clinics with higher proportions of patients referred by neurologists, physiatrists, and occupational medicine physicians. Our analyses support the contention that there is potential patient selection bias related to the type of physician referring patients to specific clinics. This hypothesis appears logical given that physician generalists and specialists typically serve different types of patient populations and specific specialties tend to manage their patients in a similar fashion.
| Implications and Conclusions |
|---|
|
|
|---|
Although it is generally accepted that the methods used to obtain a general estimate of provider performance for the purposes of internal quality improvement can be less rigorous than methods used to develop quality profiles that might be publicly reported, we believe that some effort needs to be made to adjust for the case mix of the population before comparing provider outcomes. Otherwise, the information gained through outcomes analysis will not be a true gauge of the performance of the provider and may lead to invalid conclusions. Furthermore, assessments of provider performance that are tied to public reporting or financial incentives that are based on unadjusted outcomes may penalize providers treating the sickest patients who fail to show enough improvement or require more visits in a treatment episode.52 Physical therapists, policy makers, and researchers should be aware of the threats to internal validity in these types of study designs.
| Footnotes |
|---|
Funding was provided by the National Institute of Child Health and Human Development (grant 1RO3HD051475-01).
A poster presentation of this work was given at the Combined Sections Meeting of the American Physical Therapy Association; February 4–8, 2004; Nashville, Tennessee.
Dr Hart is an employee of and investor in Focus On Therapeutic Outcomes, Inc, the database management company that manages the data analyzed in this study.
* Focus On Therapeutic Outcomes, Inc, PO Box 11444, Knoxville, TN 37939-1444. ![]()
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
D. U. Jette, R. Brown, N. Collette, W. Friant, and L. Graves Physical Therapists' Management of Patients in the Acute Care Setting: An Observational Study Physical Therapy, November 1, 2009; 89(11): 1158 - 1181. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y.-C. Wang, D. L. Hart, P. W. Stratford, and J. E. Mioduski Clinical Interpretation of a Lower-Extremity Functional Scale-Derived Computerized Adaptive Test Physical Therapy, September 1, 2009; 89(9): 957 - 968. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Resnik, D. Liu, V. Mor, and D. L Hart Predictors of Physical Therapy Clinic Performance in the Treatment of Patients With Low Back Pain Syndromes Physical Therapy, September 1, 2008; 88(9): 989 - 1004. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |