To the Editor:
We write to comment on the article by Goldstein et al titled, “The Development of an Instrument to Measure Satisfaction With Physical Therapy” published in the September 2000 issue of Physical Therapy. Our concern is that we believe faulty procedures in that research quite likely led to incorrect conclusions. We contend that use of the instrument described in that article may perpetuate misconceptions and may mislead those who are interested in patient satisfaction.
The introduction of the article professes an interest in the domains of patient satisfaction. The second paragraph on page 856 implies that no psychometric analyses have been conducted in the area of patient satisfaction in physical therapy. This is puzzling because the previous year we published an extensive study on outpatient satisfaction in this same journal.1 We reported on 3 samples with a total of 607 subjects from 21 different facilities. Our research was designed to cross-validate the Physical Therapy Outpatient Satisfaction Survey (PTOPS) across separate samples and by means of both exploratory and confirmatory factor analyses. We provided evidence to support cross-validation by replicating findings from phase 2 (model B) on a new sample in phase 3 through confirmatory factor analysis. We identified 4 domains of outpatient satisfaction: “Enhancers,” “Detractors,” “Location,” and “Cost.”
We believe that there were several methodological flaws in the Goldstein et al study. First, the researchers used 20 items to assess 11 hypothesized satisfaction domains. The rationale for proposing several of these domains, in our view, was lacking. What is more important, however, is that there was no way to test the presence of 11 domains, because 6 of the hypothesized domains were assessed with only a single item. Therefore, we argue that it was impossible to develop any intra-domain variance. Because variance is a reflection of “the extent to which the scores in a set differ among themselves”2(p17) and each set in this case consisted of only one item, it follows that there can be no variance when there is only one item per domain. While certainly there is intersubject variance, there can be no inter-item variance within domains. In the absence of this type of variance, those 6 items would gravitate to a domain with a degree of similarity and reliability. Some authors have argued that a general rule in psychometrics is that hypothesized domains should contain a minimum of 3 items for purposes of domain reliability.3 This was true of only 2 of the 11 hypothesized domains. Lacking sufficient alternatives, we believe the items clumped around a single factor.
Another concern about the study results relates to the obtained item means. Table 3 on page 859 indicates that all of the items had extremely large means, that is, subjects were very favorably inclined toward physical therapy. Roush and Sonstroem1 have discussed the unfavorable effects of high endorsement on test development. The article by Goldstein et al illustrates this well. Dividing the scale means in Table 3 by 19 (the number of items) indicates that each of the 20 items had a mean between 4.6 and 4.7 on a 5-point scale. It can be seen that items can experience a ceiling effect. For example, anyone who wants to test the effect of a particular program on patient satisfaction in a pretest-posttest design would find it difficult to show appreciable gain. The extremely high scores at entry limit the degree of gain that can be demonstrated. Unfortunately, the authors did not address this important issue. The Table 3 data, in our view, indicate the presence of both compressed variance and ceiling effect.
The Table 3 data indicate that the 20-item scale had a mean of about 93 (out of a possible 100 points). It is difficult to see how the standard deviation can be larger than 2, if that large. In the presence of this lack of variance and pronounced ceiling effect, it is difficult to believe that this scale can discriminate across levels of patient satisfaction.
Finally, we have concerns about how the survey items were phrased. All of the survey items were phrased in a positive manner; no negative statements were included in the scale. The presence of positive statements exclusively tends to create an acquiescence response set.4,5 Using only positive items in the study quite probably promoted the high scores shown in Table 3, reduced the item pool variance, and led to the finding of a single factor.
We are pleased with the interest being manifested in measuring patient satisfaction in physical therapy; understanding patient satisfaction, in our opinion, would be an outstanding asset for the profession. Additionally, we appreciate that different investigators may develop different tools to measure patient satisfaction, and we look forward to more research in this area. It is hoped that these endeavors will utilize sound test development procedures from the psychometric literature. We believe, however, that the issues presented above concerning the effort published in the September 2000 issue of the Journal discount the use of that instrument because of the failure to utilize such procedures.
- Physical Therapy
In response to Dr Roush and Dr Sonstroem, we contend that the use of the term “faulty procedures” in the first paragraph of their letter is too strong. We do not agree that the procedures were faulty, nor do we believe that they led to incorrect conclusions. We welcome the chance to respond to the questions and criticisms raised in the letter.
Although it is clear that the authors of the letter believe that we blithely ignored their own article on patient satisfaction, published in Physical Therapy, this is not what happened. First, their article was cited in our report. Although the citation appeared later in our report, it was referred to and clearly appears in the list of references. In fact, we believe that the authors misinterpreted the statement in our article to which they refer. The statement that instruments generally have not been subjected to rigorous psychometric review is legitimate and refers to those instruments in the Patient Satisfaction Instruments: A Compenium.1 The instruments listed in the Compendium, which are typical of those currently used by physical therapists, have not been subjected to psychometric analysis. We did not imply that all instruments in use have not been evaluated, and we are pleased that Dr Roush and Dr Sonstroem developed an instrument that was rigorously tested.
The third paragraph of the letter begins a series of statements about methodological flaws that Dr Roush and Dr Sonstroem perceive existed in our analysis of data. They state that we used 20 questions to assess 11 hypothesized domains of patient satisfaction. They misstated our intent and ignored what we wrote in several places. We never stated that items were selected to assess domains. Rather, we stated (paragraph 6, page 856) that 20 items were selected to represent domains. We assessed the construct validity of the domains (see Tab. 4, page 860), but we, ourselves, stated (paragraph 2, page 859) that this type of analysis is not possible for domains with only 1 item.
The letter goes on to raise the issue of lack of variance and the presence of a ceiling effect. We concur with this comment, and our concurrence is evident in the article. The second paragraph of the “Limitations” section stated that “a broad range of ratings for satisfaction is not available on which to establish and evaluate the psychometric properties of the instrument.” We invite replication of the study. It hardly seems appropriate to say “the issues presented above… discount the use of that instrument” when we, in fact, have pointed out the limitations of the analysis ourselves.
We also do not agree with the comments about our decision to phrase all items in a positive direction. The authors cite references, stating that the “presence of positive statements exclusively tends to create an acquiescence response set.” Survey instrument design is not an exact science. Shuman and Presser2 contend that changing the wording of some items to a negative direction does not have a great effect on responses. Furthermore, having all questions phrased in a similar direction reduces response measurement error among respondents who do not read each question carefully. It may also serve to decrease respondent bias and increase return rate. A changing response set may very well tax the limited reading abilities of some patients/clients. Similarly, mental rotation of a visually presented response sequence might prove very challenging to individuals with certain kinds of learning disabilities.
Our purpose in making all scales consistent was a conscious decision on our part to reduce respondent burden. At some point, clinical utility must be an important consideration in questionnaire design. Respondents—either patients/clients or clinicians—will not complete an instrument that is time-consuming. Consideration of optimal response rate must be taken into account when designing a survey instrument.
In fact, despite the extensive (and even dazzling) battery of psychometric testing to which Dr Roush and Dr Sonstroem subjected their own instrument, they perhaps failed to consider the most pertinent question in applied research. That is, having built the “better” mousetrap, will anyone actually use it, given the items it contains? Given the paucity of the physical therapy literature on patient satisfaction, and the essential need to collect such data, we share Dr Roush's and Dr Sonstroem's concern that satisfaction instruments have not been widely incorporated into daily clinical practice. But, can one really imagine that all patients and clients will easily risk their relationships with physical therapists to whom they may have to return by plainly stating their dissatisfaction? It is quite difficult to challenge a perceived authority when one also perceives that same authority is responsible for his or her return to health. Although we still have much to learn about the psychometric properties of our instrument, and fully anticipate that a better one can and will be developed, we do believe that our instrument can and will be easily adopted into practice.
Overall, we believe that the authors have created a straw man by claiming that we used wording that in fact we did not use. They then proceeded to knock it down with carefully calculated arguments. Our major problem with their approach is that their criticism of our study is based on a straw man.
In summary, we believe that the instrument described in our article should and will be used in practice. We further believe that (1) the points alleged by the authors to be oversights were actually addressed in our article, (2) we cited the limitations of our study and made no grandiose claims about the instrument or our findings, (3) we stated there was little variance, (4) we acknowledged that we cannot establish construct validity on domains represented by a single item, and (5) we indicated that our results demonstrating that a single factor underlies the instrument were based solely on this sample. We invite additional testing of the instrument. It is hardly collegial, however, to recommend discounting the use of the instrument, and we applaud the expansion of the profession's body of knowledge that such testing will produce.