|
|
||||||||
Research Reports |
HH Wang, RPT, MSc, is Pediatric Physical Therapist, Country Hospital, Taipei, Taiwan
HF Liao, RPT, MPH, is Associate Professor, School and Graduate Institute of Physical Therapy, College of Medicine, National Taiwan University, No. 17, Syujhou Rd, Taipei City, Taiwan, Republic of China
CL Hsieh, OTR, PhD, is Professor, School of Occupational Therapy, College of Medicine, National Taiwan University
Address all correspondence to Ms Liao at: hfliao{at}ntu.edu.tw
Submitted August 18, 2005;
Accepted May 2, 2006
| Abstract |
|---|
Key Words: Cerebral palsy Measurement: applied Measurement: basic theory and science Reliability of results
| Introduction |
|---|
|
|
|---|
Cerebral palsy (CP) describes a group of disorders of the development of movement and posture, causing activity limitation, that are attributed to nonprogressive disturbances that occurred in the developing fetal or infant brain.7 To evaluate the effectiveness of treatment for the motor domain, clinicians need a motor evaluative tool. The Gross Motor Function Measure (GMFM)8 and the Peabody Developmental Motor Scales (PDMS)9 are the 2 most well-known motor instruments for children with CP. However, the GMFM measures the gross motor domain only.8 For measurement of the fine motor domain, the GMFM is inadequate as an evaluative tool.
A previous responsiveness study with the gross motor (GM) composite of the PDMS (PDMS-GM) for infants with CP showed that the PDMS-GM had limitations when used as an evaluative measure for infants with CP.10 The PDMS has been revised to the Peabody Developmental Motor Scales–Second Edition (PDMS-2), with new norms, revised testing materials, more precise scoring criteria, and more information on norm samples.11 Each item of the PDMS-2 was evaluated with both conventional item analyses and modern differential item functioning analyses to select the appropriate items. New normative data on the PDMS-2 were collected through 1997 and 1998 for a sample of 2,003 children residing in the United States and Canada; not only children without disabilities but also 10% of children with various types of disabilities were included in the sample. There are also more reliability and validity data for the PDMS-2 than for the PDMS.11 Therefore, the PDMS-2 is potentially appropriate for investigating the progress of the gross and fine motor domains for children with CP because it assesses both GM and fine motor (FM) composites and incorporates both quantitative and qualitative rating criteria.
The concurrent validity studies of the standard scores on the PDMS-2 showed high correlations with the PDMS or the Mullen Scales of Early Learning: AGS Edition in the GM or the FM composite (r=.80–.91) for children for whom detailed information on health conditions was not available.11 For children with developmental delays, although the developmental quotients (DQ) of the PDMS-2 were significantly correlated with the Bayley Scales of Infant Development–Second Edition, the classification agreement between these 2 tests was poor.12 The construct validity of the PDMS-2 was established by confirmatory factor analyses, and the results showed that the GM and the FM composites are 2 separate constructs within general movement. Another construct validity study of the PDMS-2 demonstrated high correlations between age and subtest raw scores.11 One recent study13 showed that the overall diagnostic accuracy of the PDMS-2 was high, with an area under the receiver operating characteristic curve of 0.98 for children with motor disabilities. These results indicate that clinicians could diagnose motor disabilities correctly 98% of the time with the test results of the PDMS-2.14 One of the purposes of the PDMS-2 is to evaluate a child's progress after intervention.11 However, norm-referenced motor assessments should not be used as evaluative measures until they are validated to have acceptable responsiveness for children with motor dysfunction.4 Because the responsiveness of the PDMS-2 for children with CP is still unknown, one purpose of this study was to investigate the responsiveness of the PDMS-2 for children with CP.
The reliability of PDMS-2 scores was investigated by the test developers. In their study, 3 types of error variance—internal consistency, test-retest reliability, and interscorer reliability—were investigated. All of the reliability coefficients for 3 composites and 6 subtests of the PDMS-2 (Cronbach
=.89–.97, test-retest r=.82–.93, and interscorer r=.96–.99) showed that the PDMS-2 is a reliable tool for the assessment of motor development in children.11 However, only children without disabilities were recruited for that reliability study. Because the reliability levels may vary for different populations,15 the reliability of PDMS-2 scores for children with CP needs further investigation. The test-retest reliability coefficients are often thought of as stability coefficients; however, they do not reveal how much variability should be expected on the basis of measurement error.16 Thus, for estimating the confidence intervals of test scores, the standard error of measurement (SEM) of the PDMS-2 for children with CP needs to be calculated. Therefore, the other purpose of this study was to examine the test-retest reliability and SEM of the PDMS-2 for children with CP.
| Method |
|---|
|
|
|---|
48 mo) and severity (mild or severe) levels evenly distributed in the present study sample, a quota sample of 32 children with CP was recruited. Children were recruited from 2 developmental centers and 7 hospitals in the northern and eastern areas of Taiwan. To be eligible to participate in the study, children had to meet the following criteria: a confirmed medical diagnosis of CP from the attending pediatrician, age ranging from 24 to 65 months at the first evaluation, receiving physical therapy or occupational therapy at least twice per month during the study period, and written informed consent of the caregivers or guardians. The underpinnings of the therapy approaches that the children received were based on the patient/client management model,19 the International Classification of Functioning, Disability and Health model,20 the family-centered approach,21 and motor learning strategies.22 The International Classification of Functioning, Disability and Health model can be used to evaluate possible influencing factors (impairment, environmental, and personal factors) for motor disabilities and mobility for children and then to set treatment plans and goals. The exclusion criteria were as follows: a medical problem that might prevent participation in therapy programs and progressive neurological disorders or medical conditions for which progress in motor development would not be expected over a 3-month period. In epidemiology studies, ages ranging from 2 to 10 years have been chosen as the age of ascertainment for CP diagnosis.23 Furthermore, the upper limit of the suitable age for testing children with the PDMS-2 is 71 months. Therefore, we set the minimum age for children at 24 months and the maximum age at 65 months.
The severity of CP in the children with CP was measured according to the Gross Motor Function Classification System (GMFCS)24 and was rated by the physical therapists treating those children and confirmed by one senior physical therapist before the PDMS-2 assessment. In this study, children at GMFCS levels I and II were classified as having mild CP, and those at GMFCS levels III to V were considered to have severe CP.
The mean age, body height, body weight, CP severity level, and sex of the children at the first evaluation are shown in Table 1. Their ages ranged from 27 to 64 months. The clinical types of CP in these children were spastic hemiparesis (n=5), spastic diplegia (n=14), spastic triplegia (n=4), spastic quadriplegia (n=6), and ataxia (n=3). The ages (
±SD) of the fathers and mothers were 37±5.8 and 34±5.6 years, respectively. The education levels of the fathers and mothers, respectively, were graduate school (n=2 and 0), university or college (n=11 and 9), senior high school (n=11 and 17), junior high school (n=4 and 3), and primary school (n=2 and 2). The occupations of the fathers and mothers, respectively, were professional or central administrators (n=6 and 1), semiprofessional workers (n=10 and 6), technical workers (n=10 and 2), and semitechnical or nontechnical workers (n=4 and 22).25 For 1 child, information on the social or economic status of his family was not available.
|
A previous study27 indicated that responsiveness studies involving children with CP need to be at least of 3 months' duration. In line with this finding, we set the duration between the first and third measurements at 3 months.
Usually a measure must be sensitive to change before it can be responsive.6 Sensitivity to change is the capacity of a measure to assess change over time.6 Thus, 2 types of change indexes were used in this study; one was the sensitivity-to-change coefficient, and the other was the responsiveness coefficient. The responsiveness coefficient can be calculated from the differences of score change between groups of subjects who have and subjects who have not experienced "clinically important change" on the basis of retrospective judgment.28 To examine responsiveness in this study, a caregivers' rating scale to detect a retrospective global rating of change was designed on the basis of a previous study.29
Assessment and Instruments
The PDMS-2 is a standardized, norm-referenced test.11 The GM composite of the PDMS-2 includes 151 items from 4 subtests: reflexes, stationary, locomotion, and object manipulation. The FM composite comprises 98 items from 2 subtests: grasping and visual-motor integration. The total motor (TM) composite includes 249 items from all subtests. Items of the PDMS-2 are scored with a 3-point score (0, 1, and 2); a score of 2 is assigned when the child performs the item according to the specified item criterion, a score of 1 indicates that the behavior is emerging but that the criterion for successful performance is not fully met, and a score of 0 indicates that the child cannot or will not attempt the item or that the attempt does not show that the skill is emerging. Therefore, the maximum raw scores of the subtests are different, ranging from 16 to 198.
From the results of the raw scores on each subtest of the PDMS-2, the standard scores and developmental age equivalents on each subtest can be obtained from the norms in the manual for the PDMS-2. The DQs for the GM, FM, and TM composites then are derived by summing the subtest standard scores and converting them to a quotient with a mean of 100 and a standard deviation of 15.11 Folio and Fewell11 suggested that to make important decisions about diagnosis and placement for children, the clinician should rely primarily on the results of composites rather than subtests. Therefore, this study focused on composite scores only. For the 3 composites of the PDMS-2, only the raw scores, percentile scores, and DQs can be obtained from the PDMS-2 manual. In clinics, the percentile scores and DQs for the 3 composites of the PDMS-2 can be used to share the test results with others and to identify the risk for or severity of the motor developmental delay. Clinicians should know the possible magnitudes of the measurement errors of these scores. Therefore, we analyzed the test-retest reliability coefficients and SEMs of the raw scores, percentile scores, and DQs for the 3 composites.
Change indexes for measures usually were calculated from raw scores, percentage scores, and scaled scores in previous responsiveness studies for children.27,30,31 Percentile scores and DQs are scores adjusted by age and are not suitable to be used as outcome indexes.27 The PDMS-2 does not provide information on scaled scores; therefore, only raw scores on the 3 composites of the PDMS-2 were used to calculate change indexes in this study.
The caregivers' rating scale is composed of 3 items that are closed-ended questions that ask the main caregivers' perception about overall change in GM, FM, or TM areas in the previous 3 months and are based on a 7-point Likert scale (much better, better, somewhat better, about the same, somewhat worse, worse, and much worse). The rating scale was self-administered by the main caregiver of the child (usually the mother) at the time of the third measurement. The test-retest reliability of the caregivers' rating scale for motor change within 1 week was analyzed by the quadratic weighted kappa coefficient test.32 The test-retest reliability (kappa coefficient) values of the caregivers' rating scale were .63 (GM), .43 (FM), and .54 (TM), indicating moderate reliability.14 Because of the moderate test-retest reliability of the caregivers' rating scale scores, we administered the caregivers' rating scale 2 times with a 1-week interval to achieve more stable ratings. For calculating the responsiveness coefficient, only children who were rated "somewhat better," "better," or "much better" at both times were classified as having clinically important change.
Procedure
All caregivers of the children tested were informed of the procedure and purposes of the study and signed consent forms. The PDMS-2 assessments were administered by following the standard procedures outlined in the test manual.11 At the third assessment, the caregivers' rating scale was administered. All of the testing was performed by a physical therapist (with 2 years of working experience with children with CP) who was familiar with the PDMS-2 and had good interrater reliability with a senior physical therapist (intraclass correlation coefficient [ICC]=.99–1.00 for raw scores or DQs for the 3 composites). Most assessments were performed in the places at which the children received treatments regularly. A few assessments were performed at a local child assessment laboratory because of a lack of appropriate space for assessments at the original treatment area. Each child received 3 assessments at the same time during the day.
Data Analysis
In order to attain even contributions of subtests for each composite, the raw scores on each subtest were transformed to the percentage score, as has been done for GMFM-88 scores.17 For example, the percentage score on the stationary subtest equaled the raw scores on the stationary subtest divided by the maximum raw score on the stationary subtest multiplied by 100. The percentage score on the GM composite was the average of the percentage scores on 4 subtests (reflexes, stationary, locomotion, and object manipulation). The percentage score on the FM composite was the average of the percentage scores on 2 subtests (grasping and visual-motor). The percentage score on the TM composite was the average of the percentage scores on all 6 subtests.
Statistical analyses were performed with SPSS (Statistical Package for the Social Sciences) version 10.0.* Test-retest reliability and change indexes were computed as follows.
Test-retest reliability
The ICC(2,1) was used to analyze the test-retest reliability of the raw scores, the percentage scores, the percentile scores, and the DQs for the 3 composites between the first and second assessments.33 In general, values of ICC of less than .5 can be interpreted as indicating poor reliability, those between .5 and .75 can be interpreted as indicating moderate reliability, and those above .75 can be interpreted as indicating good reliability.14 The SEMs for the different scales of the 3 composites of the PDMS-2 also were calculated.15
Sensitivity to change
Four statistical analyses were performed to calculate the sensitivity-to-change coefficient: the t value of the paired t test, the effect size (ES), the standardized response mean (SRM), and the Guyatt responsiveness index (GRI) for sensitivity to change (GRI-S). The t value of the paired t test is used to analyze data originating from a 1-group repeated-measures design and concludes whether a statistically significant change in the measures over time exists or not. The ES is a standardized measure of change obtained by dividing the average change between initial and follow-up measurements by the SD of the initial measurement.26 In this study, the ES was calculated by dividing the average change between the first and third tests by the pooled SDs of the first and third tests. The value of the ES is interpreted as trivial (ES of <0.2), small (ES of
0.2 <0.5), moderate (ES of
0.5<0.8), or large (ES of
0.8) according to the well-known thresholds of Cohen.34 The SRM equals the mean change in scores divided by the SDs of subjects' difference scores.35 Therefore, in this study, it was calculated by dividing the average change between the first and third tests by the SDs of the score differences between the first and third tests. To interpret the value of the SRM for each composite, the ES thresholds (0.2, 0.5, and 0.8) proposed by Cohen34 were converted to SRMs according to the correlation coefficients between the scores on the first and third tests in this study and the formula proposed by Middel and van Sonderen36; then, the magnitude of the SRM was interpreted as trivial, small, moderate, or large according to the derived values. The GRI represents the ratio of observed change (or clinically important difference, if it is known) in a group of subjects expected to undergo a change to the variability in stable subjects.37 For sensitivity to change, in this study, the GRI-S was calculated by dividing the average change between the first and third tests by the standard deviation of the score differences between the first and third tests.26
Responsiveness
One of the GRIs,37 which reflects the extent to which change in a measure relates to corresponding change in a reference measure of clinical or health status,35 is referred to as the GRI for responsiveness (GRI-R) in this study. The GRI-R is calculated by dividing the change in the group expected to undergo a change by the variability in a stable group.26 We calculated the GRI-R by dividing the mean change score between the first and third tests for subjects classified as having a clinically important change on the basis of the caregivers' rating scales by the standard deviation of the score differences between the first and second tests for the entire group.26
| Results |
|---|
|
|
|---|
|
Sensitivity to Change
The mean percentage scores on the 3 composites of the PDMS-2 at the first and second tests are shown in Table 2, and those at the third test are shown in Table 3. The percentage scores were significantly different between the first and third tests, with t(df=31) values of 4.98 to 7.35 (P<.001). The ES value was 0.2 for all 3 composites; this value met the minimum standard proposed by Cohen for indicating a small change.34 The correlation coefficients of the percentage scores on the GM, FM, and TM composites between the first and third tests were .978, .976, and .986, respectively. Therefore, the values of the SRMs were interpreted as trivial (SRM of <1.0), small (SRM of
1.0<2.4), moderate (SRM of
2.4<3.8), or large (SRM of
3.8) for the GM composite; trivial (SRM of <0.9), small (SRM of
1.0<2.3), moderate (SRM of
2.3<3.7), or large (SRM of
3.7) for the FM composite; and trivial (SRM of <1.2), small (SRM of >1.2<3.0), moderate (SRM of
3.0<4.8), or large (SRM of
4.8) for the TM composite according to previously described methods.36 The SRM values of the percentage scores on the PDMS-2 were 1.3 for the TM composite, indicating a small change, 0.9 for the GM composite, indicating a trivial to small change, and 1.0 for the FM composite, indicating a small change in children with CP. The GRI-S values ranged from 1.6 to 2.1 (Tab. 3).
|
|
| Discussion |
|---|
|
|
|---|
Reliability is particularly important for developmental tests, either as a diagnostic test to evaluate the severity of developmental delay in clinics16 or as an evaluative test to detect the progress of a child after intervention.26 Usually, DQs and percentile scores can be used to evaluate the severity of developmental delay,4,38 and raw scores and percentage scores can be used for quantifying the effect of intervention.10,27 In this study, the reliability of the DQs, percentile scores, raw scores, and percentage scores of the PDMS-2 was investigated, and high levels of test-retest reliability were demonstrated for children with CP. A previous study showed that the test-retest reliability coefficients of the DQs for the PDMS-2 were .73 to .89 for children developing typically and aged 2 to 11 months and .93 to .96 for those aged 12 to 17 months.11 Because of differences in samples, the reliability coefficients are not directly comparable. Previous studies did not provide information on test-retest reliability for children with CP. As indicated by the results of this study, various scales of the PDMS-2 are reliable for use in clinics for motor skill acquisition or development for children with CP.
We found that the SEMs of the PDMS-2 obtained in this study were rather small, indicating that the error band of the observed scores was limited. Compared with the SEMs for the DQs of the norm samples of the PDMS-2 for children aged 24 to 72 months (3–4 for GM, 2–5 for FM, and 2–3 for TM),11 the SEMs for children with CP in this study were lower. Because the SEM is inversely related to the reliability coefficient, a relatively higher reliability coefficient may cause a lower SEM. The value of the SEM for a measure is useful for interpreting whether a change or difference in scores is beyond measurement error (ie, reaching real change or difference) in clinical settings. A higher criterion (SEM of 1.96) has been suggested for determining whether a change for a child with CP is real (ie, beyond measurement error).39 For example, the raw score on the TM composite for a child with CP should change more than 9.2 (ie, 1.96x4.7) for the change to be claimed as a real change with a 95% confidence level. On the other hand, if a child with CP has a change in the TM composite raw score of less than 9.2, it cannot be interpreted as a real improvement because such a change may be caused by measurement error.
Note that a change beyond measurement error does not necessarily indicate clinical relevance. Change beyond measurement error is the minimum level representing meaningful change. A clinically relevant change on a scale can be determined by combining both distribution-based methods (eg, SEM) and anchor-based methods (eg, parents' or clinicians' judgments).40 Although caregivers' perceptions about overall change on the GM, FM, or TM composites were determined with an anchor-based method in this study, the lack of clinicians' judgment and the modest sample size in this study limit the data for determining minimal clinically important change in the PDMS-2. Future studies to determine the benchmarks of minimal clinically important change in the PDMS-2 are warranted for clinicians to interpret their data.
This study also revealed that the percentage scores on the 3 composites of the PDMS-2 could be used for evaluating motor change in children with CP and receiving therapy. According to Liang,41 sensitivity to change is a necessary but insufficient condition for responsiveness. For a test to be relevant or meaningful to the decision maker, the responsiveness of the test should be provided.41 Our study revealed not only acceptable sensitivity to change but also acceptable responsiveness of the PDMS-2 for children with CP. The GRI-R values of the PDMS-2 for children with CP were 1.7 to 2.3 in our study. The magnitude of these statistics is comparable to that of values obtained for other outcome measures. For example, the GRI-R value of the motor component of the Functional Independence Measure for stroke was 1.29.42 Few previous responsiveness studies for children with CP used many of the change indexes suggested by Husted et al35 to determine the validity of an evaluative tool.10,29 To select proper outcome instruments, clinicians should consider the child's age and diagnosis, the purpose of testing, the reliability and responsiveness of the instruments, and the interpretability of the outcomes of the instruments.27 The previous study with the GM composite of the first edition of the PDMS (PDMS-GM) for infants with CP showed that the PDMS-GM had limitations when used as an evaluative measure for infants with CP.10 However, the change score on the PDMS-GM was not significantly different from that on the GMFM-88 for infants with CP over a 6-month period.27 Previous studies did not examine the change indexes of the PDMS-2. Our study provides sensitivity-to-change and responsiveness coefficients for the GM composite of the PDMS-2 as well as for the FM and TM composites. Our study also provides evidence for clinicians and researchers to confidently use the percentage scores of the PDMS-2 to detect a motor change for children with CP.
For sensitivity-to-change coefficients, the SRM may be preferred over the paired t value and the ES because the paired t value is influenced by sample size26,35 and the SRM, which uses the between-subject variability of individual change scores over time, provides more appropriate standardization than does the ES.37 Although a high between-subject variability of individual scores may have caused low ES values for children with CP in our study, the ES values for 3 composites of the PDMS-2 still met the minimum criterion (0.2) of Cohen.34 Guyatt et al37 suggested that the GRI was the most appropriate measure of responsiveness because it used the variability of change scores in stable subjects to standardize the clinically important difference; however, its assumption that the variance in stable subjects is approximately equal to the variance in an improved subject may induce biased estimation.41 At present, no single change index is superior. We provided 4 sensitivity-to-change coefficients and 1 responsiveness coefficient for the percentage scores of the PDMS-2 for children with CP in this study.
Advance knowledge of the responsiveness coefficient of an instrument would permit the accurate estimation of the sample size needed for adequate statistical power.37,43 If the GRI for the PDMS-2 is known, then the sample size needed for any experiment in which change over time in the PDMS-2 is the end point can be chosen immediately. According to the table in the report by Guyatt et al,37 for example, to detect a 3.2% mean change in the GM composite score (GRI-R=1.7), approximately 11 children per group would be required for a study with unpaired observations or 7 per group would be required for a study with paired observations. To detect a 4.4% mean change in the FM composite score (GRI-R=2.3), the required sample size would be approximately 7 for unpaired observations or 5 for paired observations. To detect a 3.4% mean change in the TM composite score, the required sample size would be similar to that for the FM composite score.
The sample size in this study was modest, although it is reasonable for a clinical study. To improve the representation of the study sample, we tried to recruit children with different types of CP in this study. Furthermore, our evidence regarding test-retest reliability and responsiveness was acceptable. With the purposes that we proposed and the results that we obtained, our results might not be threatened by the modest sample size. In addition, we used a retrospective global rating scale with moderate reliability to calculate the responsiveness coefficient in this study. Although we used the scores from 2 repetitive rating scales to confirm the clinically important score change in this study, a retrospective computation of responsiveness has been criticized.28 The retrospective global rating scale was valued lower than the prognostic global rating scale by Stratford and Riddle.1 However, the prognostic global rating scale can be used only by clinicians and not by caregivers. The retrospective global rating scale used to assess the importance and magnitude of a measured change is critical if health status measures are to have an effect on patient care.41 Further studies are needed to develop a valid external criterion significant for both clients and clinicians. The further study of psychometric properties (eg, minimal clinically important change) is warranted to fully explore the utility of the PDMS-2 for children with different types of CP.
| Conclusion |
|---|
|
|
|---|
| Footnotes |
|---|
This study was reviewed and approved by the Ethics Committee of National Taiwan University Hospital.
This study was supported by the Department of Health, Executive Yuan, Taiwan, Republic of China (DOH 92TD1016).
* SPSS Inc, 233 S Wacker Dr, Chicago, IL 60606. ![]()
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
F. B Horak, D. M Wrisley, and J. Frank The Balance Evaluation Systems Test (BESTest) to Differentiate Balance Deficits Physical Therapy, May 1, 2009; 89(5): 484 - 498. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y.-P. Chen, L.-J. Kang, T.-Y. Chuang, J.-L. Doong, S.-J. Lee, M.-W. Tsai, S.-F. Jeng, and W.-H. Sung Use of Virtual Reality to Improve Upper-Extremity Control in Children With Cerebral Palsy: A Single-Subject Design Physical Therapy, November 1, 2007; 87(11): 1441 - 1457. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |