Introduction. Tests of upper-extremity motor function used for people following a stroke have been described, but reliability and validity (psychometric properties) of measurements obtained with these tests have not been consistently established. This investigation was performed: (1) to review literature relative to upper-extremity motor function testing during rehabilitation following a stroke, (2) to develop selection criteria for identifying these tests in the literature, and (3) to rate the tests relative to their psychometric properties. Method. Literature searches were done using 2 databases. Reports of 4 psychometric properties were sought: interrater reliability, test-retest reliability, convergent validity or concurrent validity, and predictive validity. Results. Nine tests met the inclusion criteria of having psychometric properties reported in the literature. No test had evidence for all 4 psychometric properties. Only the Nine-Hole Peg Test was supported by 3 out of 4 properties. Most tests had 2 properties supported. Concurrent validity or convergent validity was most frequently described; test-retest reliability was least frequently described. Conclusions. More complete psychometric support is needed for upper-extremity motor function tests applied following a stroke. The absence of psychometric support, however, does not mean that a test has no value. Clinicians are cautioned not to generalize psychometric evidence.
Measurements used to evaluate outcome and performance during and following rehabilitation should have acceptable reliability and validity.1–4 Although clinicians and researchers have been using and reporting outcome measures for more than 20 years, many upper-extremity motor function tests used for adults following a stroke have not been shown to yield data with acceptable reliability and validity.1
Wade,1 in 1989, was the first author to review upper-extremity motor function tests commonly used in stroke rehabilitation. He reported that adequate tests existed and that these tests should be applied under research and clinical circumstances to establish their strengths and weaknesses. In his review, he indicated whether the degree of validity, reliability, and sensitivity of tests was known; which test domains were covered (eg, impairment or specific [focal] disability); and whether the test was composed of a battery of tasks. His recommendations for using a test were based on the amount of time required to administer the test.
Several other review articles, not specific to stroke rehabilitation, have considered hand and arm function tests.2–4 McPhee2 reviewed hand tests, emphasizing test characteristics and the importance of choosing appropriate tests for specific diagnoses. Bear-Lehman and Abreu,3 in 1989, demonstrated the continued need to provide estimates of the validity and reliability of instruments used to measure hand function. They described instruments that measure hand function in terms of range of motion, edema, performance, sensation, dexterity, and physical capacity evaluation. Rudman and Hannah4 reviewed 4 tests of hand function. They characterized what they believed was the clinical utility of tests and rated the strength of available evidence to support psychometric properties as “initial,” “limited,” or “questionable.”4 Although Rudman and Hannah's review offered a method to judge the strength of the literature describing psychometric properties, it did not include all hand function tests, nor did it delineate psychometric evidence obtained exclusively from patients who have had strokes. None of these previously described reviews demonstrated an extensive search for all available upper-extremity motor function tests, and the tests included were not selected based on a set of predetermined inclusion criteria.
In 1995, the US Department of Health and Human Services published clinical practice guidelines for “post-stroke rehabilitation.”6 Each guideline was based on the “level of research evidence” or the “degree of consensus” among experts. Using a modification of Sackett's procedures (Tab. 1),7 the guide described research evidence to support many areas of stroke rehabilitation. Sackett's levels of evidence are most readily applied to areas of research with large numbers of studies and with trials of interventions. Therefore, the group that developed the guide characterized available research designs as either randomized controlled trials or quasi-experimental designs. They also considered the amount of published research available and then made recommendations for the use of standardized assessment tools. Tables in the guide listed investigations in which the validity, reliability, and sensitivity of measurements obtained with various tests of function were examined. Measures of disability and overall motor function were included in these tables, but measures specific to upper-extremity motor function were not included. Evidence supporting the use of upper-extremity motor function outcome measures, therefore, has not been systematically reviewed.
In this article, a review is presented of published upper-extremity motor function tests used with patients who have sustained a stroke. Our literature review was restricted to studies of assessment tools that involve direct interpretation of the motor abilities of the affected upper extremity. In our clinic, we have observed that disability-oriented upper-extremity tests can allow for compensatory movements by the unaffected upper extremity. Therefore, these types of tests were excluded. Tests were considered only if they were functional limitation oriented or measured functional limitations and impairments.*
In this review, the importance of having an assessment tool supported by published accounts of validity and reliability is discussed. We have identified the presence or lack of psychometric support for a number of upper-extremity assessment tests based on published reports of interrater reliability (IRR), test-retest reliability (TRT), convergent validity (CVV) or concurrent validity (CCV), and predictive validity (PV) (Appendix 1).1,2,8
The purposes of this investigation were: (1) to review all available literature relative to upper-extremity motor function tests used for people during rehabilitation following a stroke, (2) to develop and use criteria to select tests and relevant literature, and (3) to rate tests relative to available psychometric evidence that supports the use of upper-extremity motor function testing following a stroke.
Article and Test Selection
A preliminary PubMed search by a physical therapist (EC) was completed using a variety of key words related to testing of upper-extremity motor function. The purposes of this search were to become familiar with relevant literature, to review the quantity and quality of relevant literature, and to make adjustments in key word selection. Two PubMed searches were then conducted independently by 2 physical therapists (EC and CB). The literature was searched from 1965 to 1999. Only English-language articles were requested. Combinations of the following key words were used: “arm function,” “assessment,” “cerebral vascular accident,” “function,” “functional assessment,” “neuroplasticity,” “stroke,” and “upper extremity.” A professional librarian assisted with searching the Cumulative Index to Nursing and Allied Health Literature (CINAHL) database for relevant articles indexed from 1983 to December 1999.
Approximately 2,200 article titles were identified in the 3 searches. Of this original group, 170 articles were selected to receive more detailed examination. Selection was based on article inclusion criteria of being published in a peer-reviewed journal and having at least one of several design objectives related to upper-extremity motor function tests. A few discrepancies occurred between the lists generated in the 2 searches. In these cases, we conferred until agreement to include or exclude the article was reached. Appendix 2 describes in more detail the inclusion criteria for this stage of the literature review.
After examination of these 170 articles, we identified 31 different tests that were used for upper-extremity motor function. A set of inclusion criteria related to the nature of the tests and how they were applied was then used to identify a subset of 13 articles. These articles not only described the use of upper-extremity motor function tests on people who had strokes, but also presented data that could be used to evaluate the psychometric properties of the tests. Details of the inclusion criteria are given in Appendix 2. Additional searches were performed using the 31 named tests as key words to be sure that other publications had not been missed in the 2 main literature searches. Tests that did not meet the inclusion criteria are listed in Table 2. Nine tests that qualified are described in Table 3.
Summaries of the final 13 articles that reported psychometric evidence for the 9 tests meeting the inclusion criteria are shown in Appendix 3.9–21 Data from these articles were recorded in a nonrelational database using Filemaker Pro 3.0 software.† Database fields included title, author, year, number of subjects, age of subjects, and whether the subject groups were people with or without stroke. Tests described in the articles were recorded in a checklist data field. Therefore, all tests referenced or used in the studies described in the articles could be added as the project continued. Inclusion criteria were applied to each test from the checklist to establish the list of appropriate tests. Comment fields were used to collect information regarding psychometric properties and statistical analyses.
Levels of Evidence
Upper-extremity motor function tests examined for this review were assigned to ordinal categories (levels I, II, and III) according to how many psychometric properties had demonstrable (published) results for those tests within a group of patients with strokes. If data for all 3 of the psychometric properties—IRR, TRT, and CVV/CCV—were reported for subjects following a stroke and produced significant correlations (P<.05) between repeated evaluations or between final evaluation and a reference instrument, the test was assigned to level I. If 2 of the 3 psychometric properties were supported by significant correlations, the test was assigned to level II. If only one of the psychometric properties was supported by a significant correlation, the test was assigned to level III. For some studies, both correlations and significance were reported, and, for some studies, the principal author (EC) calculated significance values based on reported correlations and the number of subjects in a study.8
If the PV of a test is established, the clinical utility of that test is improved. Therefore, to give credit to tests for which PV was determined, this finding also was reported. Predictive validity could have been examined to predict placement after discharge, use of assistive device, functional independence during inpatient stay, and so on. These predictions may be projected to a year after stroke onset. Predictive validity only considers conditions at the time of testing. Many other factors may influence the long-term performance capabilities of a patient.
This section describes the tests according to the levels to which they were assigned, from the highest level (most psychometric support or level I) to the lowest level (least psychometric support or level III). Table 3 documents the psychometric evidence for each test. Table 4 summarizes levels to which the tests were assigned. Only the Nine-Hole Peg Test was assigned to level I. Most other tests ranked in level II because evidence was available to support only 2 psychometric properties, most frequently CVV/CCV and IRR.
Level I: Established by Evidence for 3 Psychometric Properties (IRR, TRT, and CVV/CCV)
Nine-Hole Peg Test.
The Nine-Hole Peg Test had evidence in 3 psychometric categories: IRR, TRT, and CVV/CCV.13,18 In 1987, Heller et al13 claimed to have established the IRR and TRT of measurements obtained with the Nine-Hole Peg Test and reported correlation coefficients as a range of values (Spearman r=.68−.99, n=10) with associated probability values of P<.025 to P<.001. However, which r values were specifically for IRR (between different raters [ie, clinicians]) or TRT (comparing first scores of raters with second scores of the same raters for the same patients) could not be determined from the article. In 1986, CVV of measurements obtained with the Nine-Hole Peg Test was first established against the Motricity Index, an impairment- and function-oriented test (r=.82, n=187).18
Level II: Established by Evidence for at Least 2 Psychometric Properties (IRR, CVV/CCV Plus PV)
Motor Assessment Scale.
Loewen and Anderson16 and Poole and Whitney19 separated the Motor Assessment Scale into 3 upper-extremity components or subscales and determined IRR for each subscale. Both pairs of investigators used the same subscales: upper arm function, hand movement, and advanced hand activities. Poole and Whitney described IRR between 2 raters using 24 subjects and Spearman correlation coefficients. The Spearman correlation coefficients were 1.00, 1.00, and .98 for the respective subscales. Loewen and Anderson described IRR among 14 raters using 7 subjects and percentage of agreement with kappa coefficients. Percentages of agreement were 96.2%, 100%, and 100%, with kappa coefficients of .93, 1.00, and 1.00, respectively. Both pairs of investigators concluded that high IRR could be attained using the Motor Assessment Scale. In addition to IRR, Poole and Anderson described the CVV of data obtained with the Motor Assessment Scale using Spearman correlations. They estimated CVV values by comparing the Motor Assessment Scale and Fugl-Meyer Sensorimotor Assessment scores for 30 subjects. The CVV values were r=.89 (proximal component of upper-extremity motor function), r=.92 (distal component of upper-extremity motor function), and r=.91 (total upper-extremity motor function).
In 1990, Loewen and Anderson,17 using Spearman correlations, performed additional psychometric testing to examine the predictive ability of the Motor Assessment Scale. When using the combined arm score of the Motor Assessment Scale, scores at 1 month following stroke correlated with arm scores at discharge (r=.94, P<.0001, n=50).
The Motricity Index had evidence for CVV18 as early as 1986, for PV in 1989,21 and for IRR in 1990.10 Collin and Wade,10 using Spearman statistics for 20 subjects, reported IRR of the Motricity Index to be r=.88. Results were obtained by examining scores obtained for the following components of the upper-extremity subscale of the test: grip, elbow flexion, and shoulder abduction. Hsieh et al,14 using intraclass correlation coefficients (ICCs), described the CVV of Motricity Index values by comparing them with Action Research Arm Test scores (r=.87, n=50). Parker et al,18 using linear correlation, also described the CVV of Motricity Index values by comparing them with Nine-Hole Peg Test scores (r=.82, n=187). In 1990, Collin and Wade10 sought to establish the CVV of data obtained with 3 different tests: the Motricity Index, the Trunk Control Test, and the upper extremity subscale of the Rivermead Motor Assessment. They believed the Motricity Index and the Trunk Control Test were the tests requiring comparison, and the Rivermead Motor Assessment was used as the “established” measure. No evidence, other than that reported by Collin and Wade, was found to establish the Rivermead Motor Assessment as having validity or reliability for patients following a stroke. The correlations between the Motricity Index upper-extremity subscale scores and the Rivermead Motor Assessment upper-extremity subscale scores across 3 time periods (6, 12, and 18 weeks after stroke) were .76 (n=27), .73 (n=25), and .74 (n=14), respectively.
Sunderland et al,21 in 1989, also used the Motricity Index to describe the PV of what they called “grip strength” as determined with an electronic goniometer. They found that the Motricity Index was better than the Nine-Hole Peg Test at identifying subjects who would score above zero on the Frenchay Arm Test 6 months after initial assessment.
Level II: Established by Evidence for IRR and CVV/CCV
Action Research Arm Test.
The IRR of data obtained with the Action Research Arm Test was established by Hsieh et al14 and Lyle.22 Only Hsieh et al, however, reported studying patients with strokes exclusively. The ICC value was established at .98 using 50 patients.
In an effort to establish CCV/CVV of data obtained with the Action Research Arm Test, Hsieh et al14 compared the Action Research Arm Test with 3 other tests: Motor Assessment Scale, Modified Motor Assessment Chart and the Motricity Index. Only the upper-extremity subscales of the Motor Assessment Scale, Modified Motor Assessment Chart, and Motricity Index were used. Concurrent validity of data obtained with the Action Research Arm Test was assessed by using the Motor Assessment Scale. Hsieh et al examined 50 subjects and found the Motor Assessment Scale and the Action Research Arm Test were closely associated (r=.96). Convergent validity of data obtained with the Action Research Arm Test as compared with data obtained with the Modified Motor Assessment Chart and the Motricity Index was established at .94 and .87 using Pearson correlations.
Chedoke-McMaster Stroke Assessment.
The Chedoke-McMaster Stroke Assessment has both an impairment inventory and a disability inventory. In 1993, Gowland et al12 published a study of the reliability of data obtained for independent components of the impairment inventory. Impairment inventory components included measures of shoulder pain, postural control, and arm (upper-extremity), hand, leg, and foot function. Interrater reliability was established for the arm (upper-extremity) subscale (ICC=.88) and for the hand subscale (ICC=.93). Concurrent validity (r=.95) was reported for the combined arm (upper extremity) and hand components correlated with the combined Fugl-Meyer Sensorimotor Assessment shoulder, elbow, forearm, wrist, and hand scores.
Fugl-Meyer Sensorimotor Assessment.
Interrater reliability and CVV have been established for the Fugl-Meyer Sensorimotor Assessment multiple times. Duncan et al11 performed the first analysis of Fugl-Meyer Sensorimotor Assessment intrarater reliability and IRR on subjects whose mean time from onset of stroke was 51 months and found r=.98 to .99. Sanford et al20 repeated Fugl-Meyer Sensorimotor Assessment reliability studies on patients during rehabilitation 6 days to 6 months following a stroke and found an ICC value of .97 for the upper-extremity component of the test. Sanford et al did not report the mean time from onset of stroke. Of the tests included in this review, the upper-extremity portion of the Fugl-Meyer Sensorimotor Assessment has been compared against the Chedoke-McMaster Stroke Assessment (r=.95)12 and the Motor Assessment Scale (r=.88).19
Level III: Evidence Established by CVV/CCV Plus PV
Modified Motor Assessment Chart.
The Modified Motor Assessment Chart utilizes subscales of upper- and lower-extremity function and standing leg movements; it is a modified version of the Fugl-Meyer Sensorimotor Assessment. This test requires the patient to perform one-handed activities; both arms are evaluated separately.15 Lindmark and Hamrin15 examined the validity of data obtained with the Modified Motor Assessment Chart. Because they did not differentiate between the upper- and lower-extremity portions of the Fugl-Meyer Sensorimotor Assessment, their article did not demonstrate support for CVV/CCV as deemed adequate for this review. Hsieh et al,14 however, compared the Modified Motor Assessment Chart (upper-extremity portion) and the Action Research Arm Test and found a close association (Pearson r=.94). As defined by criteria in this article, PV has been adequately described for the Modified Motor Assessment Chart.15 Using regression analysis, Lindmark and Hamrin15 reported that the Modified Motor Assessment Chart provides prognostic information relative to survival, discharge destination, and functional score on discharge.
Level III: Evidence Established by IRR
Motor Club Assessment.
The Motor Club Assessment was first described as an upper-extremity motor function test for people following a stroke by Ashburn.9 No statistical analyses were performed. To describe IRR, Ashburn reported the number of disagreements among 15 paired observations and noted “minimal error.” Although Ashburn did not report kappa values or specific percentages, this was the only article found that described how reliable data obtained with the Motor Club Assessment might be when the test is used between raters.
Sunderland et al21 used the Motor Club Assessment to examine grip force as a prognostic tool. They reported using the Motor Club Assessment to establish grip force as a predictive measure of outcome at 6 months following a stroke, but a secondary finding also was reported when they used the Motor Club Assessment to predict performance on the Frenchay Arm Test. The Motor Club Assessment classified 3% of the “cases” incorrectly, while the Motricity Index had identified all “cases” correctly.
Level III: Evidence Established by CVV/CCV
Rivermead Motor Assessment.
This instrument was demonstrated to have CVV with the Motricity Index (r=.88).10 Collin and Wade's objective was to establish CVV among 3 tests and to determine the validity and reliability of data obtained with the Motricity Index and Trunk Control Test.10 Only upper-extremity subscales were used for the correlations. Collin and Wade's article was the first to report use of the Rivermead Motor Assessment (upper-extremity subscale) with this diagnosis.
Disability-oriented tests were not included in our review because these tests permit use of compensatory activity. We believe a measurement of motor recovery is difficult to obtain when an uninvolved extremity can compensate for deficits of the involved side. Twelve of the 30 tests were excluded because bilateral activity was allowed during testing (Tab. 2). Although tests including bilateral activity may be closely related to activities at home and may be appropriate measures of function, their use, in our opinion, would not allow them to reflect unilateral motor recovery. Some clinicians may feel that tests of functional limitations that we describe are not as useful as tests of disabilities because the latter more closely relate to activities of daily living. Disability-oriented tests very likely may contribute more than tests of functional limitation to evaluation during rehabilitation.
Some potentially eligible articles may have been overlooked in our review. The inclusion criteria for this review were intended to identify articles that had the purpose of examining evidence to support the psychometric properties of tests, either as the primary objective or while examining upper-extremity motor function following a stroke. We therefore reviewed articles that alluded in their title, abstract, or key words to examining psychometric properties. Some articles may have been excluded inappropriately because reporting on psychometric properties was not part of the primary investigation or the title did not clearly refer to psychometric testing. This problem is inherent in any computerized literature review that is dependent on key words. We believe the chances of overlooking articles were reduced by using follow-up searches where the key words included the names of the tests being investigated.
- concurrent validity
- convergent validity
- interrater reliability
- predictive validity
- test-retest reliability
Articles were retrieved even if the title did not specify an investigation of test psychometric properties (Appendix 2). In the article by Parker et al,18 for example, “loss of arm function after stroke: measurement, frequency, and recovery” does not imply an examination of psychometric properties. However, there are data in this article about the psychometric properties of the Motricity Index. This article was retrieved because of our broad article inclusion criteria. If this article were omitted upon initial title review, it would have been identified during subsequent searches based on test name.
Some tests may have more than one accepted designation (eg, the Jebsen Hand Test is the same as the Jebsen-Taylor Hand Test), and some tests may not have been coded as key words in the PubMed or CINAHL databases. Multiple combinations of test names were used to search for supporting literature as much as possible, but some variants may have been missed. Redundant literature searches and broad inclusion criteria for articles helped to eliminate this possibility of missed articles (Appendix 2).
The quality of observational data depends on partitioning of data variance into a true component (the theoretical exact value) and error components, which may include subject variations, rater variations, and any number of additional environmental variations.23 Because of the latter variations, strong psychometric properties of measurements for one population should not be considered supportive for populations with different diagnoses. Portney and Watkins8 provided a discussion of how this concept (“generalizability theory,” originally proposed by Cronbach et al in 197224) may be applied to clinical practice. This criterion limited much of the evidence. For example, the Box and Block Test and the Jebsen Hand Test were reported to be well-established tests,25,26 but no evidence, based on our criteria, was found to support their use with patients following a stroke.
The Box and Block Test met our inclusion criteria and had been used to test upper-extremity motor function in a study of patients following a stroke.25 The authors, however, referenced a previously published article (ie, Desrosier et al26) for reliability and validity. Desrosier et al26 had performed reliability tests with individuals having a variety of diagnoses. Therefore, based on our criteria, this reference did not support the use of the Box and Block Test for patients following a stroke.
Similarly, the Jebsen Hand Test was studied and described in an article about subjects with strokes. Results of an investigation of TRT of data obtained with the Jebsen Hand Test were published.27 In that study, however, only 5 subjects diagnosed with strokes out of a total of 26 subjects with various diagnoses were used. Therefore, we believe this evidence could not be considered definitive for testing psychometric properties in groups of people with the diagnosis of stroke.
A method to examine the quality of evidence has been proposed by Rudman and Hannah.4 According to Rudman and Hannah's definitions or criteria, all of the articles found would have demonstrated “initial support” or “limited support”‡ for each of the psychometric properties. Given Rudman and Hannah's criteria, the rank of the Jebsen Hand Test or the Box and Block Test would be elevated from no support to limited support (“limited support” meaning “inadequacies in the research designs” of studies conducted to investigate reliability or validity).4 Intervention strategies such as the use of therapeutic exercise for people after stroke and clinical assessments that have resulted in extensive literature would be more suitable for examination according to Rudman and Hannah's criteria. Inclusion criteria for this review required that the studies (reporting psychometric properties) had used only subjects who had had a stroke. Consequently, evaluation of the studies in this review is based on quantitative criteria. Operational definitions of Rudman and Hannah's levels of support could be refined for future reviews, but further methodological discussion extends beyond the scope of this article.
The statistics selected for this review are commonly used to measure reliability and validity. They directly address measurement questions of clinical interest. There may be questions in the scientific community as to whether some psychometric properties might be more important than others and which statistical tests would be most appropriate. We have made no attempt to argue for a value system to differentiate the statistics used. We have attempted to describe the psychometric properties and supportive statistical tests reported in the literature with minimum bias. Clinicians and researchers are encouraged to become familiar with the meanings of the different psychometric properties and to decide which are more important for their applications and whether the statistics used in a given study were appropriate.
When the preliminary searches were conducted for this review, there were limited data to describe most psychometric properties of the tests. Therefore, we placed more emphasis on categorizing the available evidence than on rating its quality.
All correlation coefficients reported in this review exceeded critical values (ie, were significantly different from a correlation of zero).8 Colton28 recommended stratifying r values into categories of “poor,” “fair,” and “excellent.” This categorization may allow for subjective and perhaps inaccurate classification. Whether a “significant” correlation coefficient is acceptable should rely on the judgment of both the clinician and the investigator. Error in measurement is unavoidable, given multiple sources of variability such as human factors and environment. The amount of error that is acceptable will depend on the purpose and specific clinical circumstances surrounding test use, and authors should report reliability relative to these issues.29 In addition, Portney and Watkins described the following: “[C]orrelation coefficients cannot be interpreted as proportions…. The difference in the degree of relationships between .50 and .60 is not necessarily the same as the difference between .80 and .90.”8(p494) Even values less than .50 can represent strong relationships if the number of subjects is sufficient. No articles were found in this review that reported nonsignificant findings.
Some authors,30,31 however, do not accept linear correlations as appropriate tools for reporting reliability data. Intraclass correlation coefficients are perhaps more appropriate because they describe agreement of the scores and not just covariation (or association). Linear correlation coefficients often can overestimate the reliability of data obtained with a test because the relationship between true variance and observed variance may be overlooked. However, only 2 groups of authors12,14 chose to utilize ICC analysis.
Sample size may become an issue when considering psychometric properties. In Appendix 3, the samples used for the reviewed studies are listed. Counts ranged from a low of n=7 in the study by Loewen and Anderson16 to a high of n=231 in the study by Lindmark and Hamrin.15 Summation of individual samples from many studies would be inappropriate for statistical interpretation. However, a study with a large sample size, in general statistical terminology, would have greater power than a study with a smaller sample size, which decreases the likelihood of not correctly identifying a significant relationship. Clinicians and researchers utilizing the tests described in this review might want to attribute greater weight to supportive studies with larger sample sizes.
Conclusions and Implications
Among 9 tests evaluated, 5 studies10,12,14,17,20 describing psychometric properties for upper-extremity motor function tests used for people following a stroke were published since Wade,1 in 1989, encouraged investigations of psychometric properties of tests that were already published. Many studies of patients following a stroke lacked evidence to support the test used either because analyses were not performed or because references to support evidence about psychometric properties were too generalized. Scientific evidence should be the basis for recommendations to use specific tests. Only for the Nine-Hole Peg Test was evidence found in this review to support TRT, IRR, and CVV. This finding does not imply that the Nine-Hole Peg Test is the “best” test. Other tests may be equally applicable for testing upper-extremity motor function, but psychometric support must be established and reported first.
In 1991, Physical Therapy published a document titled “Standards for Tests and Measurements in Physical Therapy Practice.”32 This document provides guidance to clinicians and investigators to help ensure the development of useful and meaningful measurements and describes standards for “ensuring integrity in measurement standards.” We agree with the point made in the document that, although tests may not necessarily meet all of the standards set forth in the document, test users incur the responsibility of knowing the limitations of measurements and making logical arguments to support their test selection.
Mrs Croarkin and Dr Danoff provided concept/idea/design. All authors provided writing and data analysis. Ms Barnes provided project management. Mrs Croarkin and Ms Barnes provided data collection and consultation (including review of manuscript before submission)
↵* As in the article by Rudman and Hannah,4 definitions of rehabilitation domains (eg, impairments and functional limitations as described by the National Center for Medical Rehabilitation Research5) were used in the current investigation.
↵† FileMaker Inc, Corporate Headquarters, 5201 Patrick Henry Dr, Santa Clara, CA 95054-1171.
↵‡ Initial support, according to Rudman and Hannah,4 would indicate that some studies have had positive results supporting validity or reliability. Limited support would indicate inadequacies in the research design of studies conducted.
- Received March 15, 2002.
- Accepted July 10, 2003.
- Physical Therapy