|
|
||||||||
Research Reports |
YP Chiu, PT, MHS, is a doctoral candidate in rehabilitation science, Department of Physical Therapy, College of Public Health and Health Professions, University of Florida, Box 100154, UFHSC, Gainesville, FL 32611-0154 (USA)
SL Fritz, PT, PhD, is Clinical Assistant Professor, Department of Exercise Science, University of South Carolina, Columbia, SC; Graduate Student, Department of Physical Therapy, College of Public Health and Health Professions, University of Florida, Gainesville, Fla; and Pre-Doctoral Fellow, VA Brain Rehabilitation Research Center, Malcom Randall VA Medical Center, Gainesville, Fla
KE Light, PT, PhD, is Associate Professor, Department of Physical Therapy, College of Public Health and Health Professions, University of Florida
CA Velozo, OTR/L, PhD, is Associate Professor, Department of Occupational Therapy, College of Public Health and Health Professions, University of Florida, and Research Health Scientist, Rehabilitation Outcome Research Center, North Florida/South Georgia Veterans Health System, Gainesville, Fla
(yipaoc{at}ufl.edu). Address all correspondence to Mr Chiu
Submitted October 11, 2004;
Accepted January 4, 2006
| Abstract |
|---|
Key Words: Balance, Gait Psychometric measurement
| Introduction |
|---|
|
|
|---|
The Dynamic Gait Index (DGI)9(pp405-406),13 is a standardized clinical assessment that aids in evaluating a persons ability to modify gait in response to changing gait task demands. The DGI is a performance-based test developed as part of a profile of tests and measurements that are effective in predicting likelihood for falls in community-dwelling older adults.14 The DGI has been shown to yield ratios of subject variability to total variability with excellent interrater reliability (.96) and test-retest reliability (.98) when rated by physical therapists.15 The DGI correctly classifies 59% of people with a history of falls (sensitivity) while correctly classifying 64% of those without a history of falls (specificity).16 The DGI rates performance from 0 (severe impairment) to 3 (normal) on 8 different gait tasks. The 8 tasks, administered from item 1 to item 8, consist of gait on even surfaces, gait when changing speeds, gait and head turns in a horizontal direction, gait and head turns in a vertical direction, gait with pivot turns, stepping over obstacles, stepping around obstacles, and ascending and descending steps.9(pp405-406) Scores on the DGI range from 0 to 24. Although a recent study by Boulgarides et al14 showed that the DGI (along with 4 other commonly used balance assessments) cannot predict falls in a sample of community-dwelling, active, independent older adults, the DGI has been shown to be correlated with falls in other populations.9(p401),15 Shumway-Cook et al15 showed that a score of 19 or less, out of 24, indicates an increased risk of falling in older adults.
Many rehabilitation specialists believe that balance assessment under multitask conditions (ie, performing more than one activity at the same time, such as walking forward and simultaneously looking up and down) may be a more sensitive indicator of balance problems and falls than balance assessment in a single-task context.17–20 This belief is attributable to the fact that elderly people often fall when they try to perform 2 activities at once.21 Given that the DGI has many tasks that allow for testing under multitask conditions (eg, walking with head turns or stepping over obstacles), it should be a more sensitive indicator of balance problems than other commonly used balance assessments that do not incorporate multiple tasks into the evaluation. Resnick22 reported that 63% of falls occurred while walking, which is the key factor used across items in the DGI. Furthermore, the DGI has been shown to be a sensitive assessment tool for identifying people who are at risk for falls because of vestibular disorders.23,24 However, one component of the DGI scale not addressed so far in the literature is the hierarchy of item difficulty. No explicit hierarchy was intended when the DGI was developed, although it is usually administered in a standardized order (Anne Shumway-Cook, PT, PhD; verbal communication with YP Chiu; February 2003). Nevertheless, knowing the hierarchy of item difficulty can be an asset to both the researcher and the clinician.
Administering items by starting with the easiest and moving to the most difficult may be a logical progression in testing clients. Furthermore, if the hierarchical structure of the DGI is validated, the selective administration of items depending on an individuals ability level may prove to be efficient. For example, if a client is functionally ambulatory, instead of testing "gait on level surface," a more challenging item, such as "gait and pivot turn," could be administered initially. On the basis of the importance of the DGI as a clinical tool and research instrument in the assessment of balance and in the identification of people who are at risk for falls, it is worthwhile to evaluate further the item characteristics of the instrument by use of the Rasch measurement model. A number of articles recently published in the physical therapy literature support the use of Rasch analysis to clinically validate functional assessments.25–28
Although traditional psychometric approaches focus on the total score of a given instrument, the Rasch measurement model allows analysis of instruments at the item and rating scale levels. First, Rasch analysis converts ordinal raw-score data, such as the scale from 0 to 3 on the DGI, into an interval-based measure, the log-odd metric, or logit. Second, the analysis allows the determination of whether the rating scale is used in the expected manner (eg, people with lower balance ability would be expected to use lower item ratings, whereas people with higher balance ability would be expected to use higher item ratings). Third, the Rasch measurement model provides a connection between a persons total score and the items of the instrument by placing the persons ability (person measure) and item difficulty (item measure) on the same linear continuum. Ceiling and floor effects are revealed when persons ability and item difficulty fail to match at the extremes of the continuum. Item goodness-of-fit statistics provided by the analysis determine the extent to which each item fits the construct it is intended to measure. High fit statistics may indicate that the item is mismarked, poorly worded, or misinterpreted. In combination with principal components analysis (PCA), high fit statistics may identify a subset of items that measure a unique construct.29
The purpose of this study was to use Rasch measurement theory to examine: (1) whether the DGI rating scale meets suggested psychometric guidelines, (2) whether the hierarchical order of DGI tasks is consistent with a clinically logical testing procedure (ie, moving from easy items to more difficult items), and (3) whether the DGI represents a unidimensional construct (ie, all items reflect a single latent trait [balance] rather than multiple constructs [both balance and endurance]).
| Method |
|---|
|
|
|---|
=5.5, median=4). The Timed "Up & Go" Test scores12,30 ranged from 7.58 seconds to 43.66 seconds (
=18.47, median=16.53), and the Berg Balance Scale scores ranged from 29 to 56 (
=42.65, median=43).31
Analyses
The Rasch measurement model with the WINSTEPS program32 was used in this study because it offers distinct advantages over traditional psychometric approaches. As stated above, Rasch analysis focuses on the psychometric properties of the item, person, and rating scale categories. Two values are used throughout the analysis: logit measures and fit statistics. Logits, or log-odd units, convert ordinal raw scores into linear interval measures.33(pp17-19) The logit is the natural logarithm of the odds of a person being successful at a specific task or an item being successfully carried out.34 For the person category, logit measures indicate whether one person is more able than another (eg, Does one person have better balance ability than another?); for items, logit measures indicate whether one item is more difficult than another (eg, Is stepping over an obstacle more difficult than walking on a level surface?); and for rating scale categories, logit measures indicate whether one rating scale category is greater or less than another in degree (eg, Does a rating of 2 [mild impairment] represent less impairment than a rating of 3 [moderate impairment] in the DGI?).
Fit statistics33(p208) monitor the compatibility of the raw data with the Rasch measurement model. Fit to the Rasch measurement model requires that high ratings on more difficult items are accomplished by people with higher ability and that people have a greater probability of attaining higher scores on easier items than on more difficult ones.35 In general, mean square (MnSq) fit statistics, which are used to identify item and person ratings that deviate from expectations, range from 0 to positive infinity. The MnSq fit statistics value is the ratio of observed variance (variance attributable to the data) to expected variance (variance estimated by the Rasch measurement model). Ideally, the ratio will be 1.0, so that observed variance equals expected variance. When the MnSq fit statistics value is greater than 1.0, for example, 1.70, there is 70% more variation in the observed data than the Rasch model predicted. When the fit statistics value is less than 1.0, there is less variation in the observed data than the Rasch model predicted (ie, overfit).33(p177) Two types of fit statistics are provided in this study: outfit and infit statistics.33(p208) Both are the average of standardized residual variance. Standardized residual variance is the difference between the observed score and the Rasch estimated score divided by the square root of the Rasch model variance.36 Outfit statistics are unweighted, being affected more by unexpected responses far from the person, item, or rating scale category measure (eg, a person of low ability unexpectedly having a normal score on a difficult item). Infit statistics are weighted, being affected more by unexpected responses close to the person, item, or rating scale category measure (eg, a person of low ability unexpectedly having a score indicating severe impairment on an easy item).
Rating scale analysis was accomplished by determining whether the DGI 4-point rating scale met Linacres 3 essential criteria for optimizing rating scale category effectiveness.37 The criteria are as follows: 10 observations are obtained per rating scale category, category logit measures advance (eg, the average logit measure for the rating scale category "mild impairment" is greater than the average logit measure for the rating scale category "moderate impairment"), and the outfit MnSq value for each rating scale category is less than 2.0. In the present study, the frequency of each of the 4 rating scale categories in the DGI was computed. Average logit measures for 4 rating scale categories were used to determine whether the rating scale categories of the DGI advance monotonically. As proposed by Linacre,37 the outfit MnSq value for each rating scale category was compared with the threshold value of 2.0. Values of greater than 2.0 suggest that more unexplained variance than explained variance is found.
The hierarchical order of the DGI items also was determined with the WINSTEPS program. The items of the DGI were arranged from the least difficult to the most difficult according to their corresponding logit measures. Item hierarchy can be used to investigate construct validity (ie, support or refute the expectation that "stepping over obstacles" is more challenging than "walking on a level surface" in the DGI). Furthermore, the comparison of item difficulty with person ability (ie, item-person map) can be used to determine whether the items of an instrument cover the range of person abilities in the sample (ie, reveal ceiling or floor effects).
Next, Rasch fit statistics in combination with PCA were used to test the unidimensionality of the DGI.38 Reasonable ranges of MnSq fit values are between 0.6 and 1.4 and are with standardized z values of less than 2.0.39 Recent studies40–44 suggested that fit statistics alone are inadequate for determining unidimensionality. Therefore, to test further for unidimensionality, a PCA based on residuals45 was conducted.41–43 The PCA transforms correlated items into principal components. In the determination of unidimensionality, it is expected that after the removal of the Rasch dimension (eg, the trait that the DGI intends to measure), the residuals for pairs of items should be uncorrelated and normally distributed.42 That is, there should be no principal components. When the first principal component has an eigenvalue of less than 1.4, then the measure is considered unidimensional.44
Finally, the WINSTEPS program provides several summary statistics for person ability and item difficulty logit measures. Person separation and person separation reliability are indicators of how well the items of the instrument separate or spread out the subjects in the sample. Person separation is an index of the sample standard deviation in terms of standard error units.46(p106) Person separation reliability is the proportion of observed sample variance that is not attributable to measurement error.46(p106) This value is analogous to the Cronbach alpha.33(p207) Similarly, item separation and item separation reliability are indicators of how well the subjects in the sample separate or spread out the items of the instrument. Item separation is an index of the item standard deviation in terms of calibration error units.46(p92) Item separation reliability is the proportion of observed item variance that is not attributable to estimation error.46(p92)
| Results |
|---|
|
|
|---|
|
Hierarchical Order of the DGI Tasks
Table 2 shows the DGI item administration order compared to the Rasch analysis-derived item difficulty order. The left-most column of Table 2 shows the original DGI item administration order (1–8). Item difficulty order was determined by use of the Rasch logit measures in the second column. "Level surface," "speed change," and "around obstacles" appeared to be the easiest items (lowest logit measures), whereas "vertical head turns," "steps," and "horizontal head turns" appeared to be the most difficult tasks (highest logit measures). Although the first 2 items routinely administered represent the easiest items (ie, "level surface" and "speed change"), the order of administration of the remainder of the items does not match the item difficulty order. For example, the item "around obstacles" is the next-to-last item administered, although it represents a fairly easy item, and the item "horizontal head turns" is the third item administered, even though it represents the most difficult item of the DGI.
|
|
The PCA of the DGI showed that the residual component (ie, the component beyond the single latent trait) has an eigenvalue of 1.8, representing only 22.5% (1.8/8) of the residual variance. In simulation studies, Smith and Miao44 reported that eigenvalues of less than 1.4 are at the random level. Therefore, the DGI items are essentially unidimensional. Table 3 shows the factor loadings of 8 items for the secondary dimension in the DGI. Three items (items 3, 4, and 5) with head turns ("horizontal head turns," "vertical head turns," and "pivot turn") load in the direction opposite that of the remaining 5 items.
|
| Discussion |
|---|
|
|
|---|
Rating Scale
The sound psychometric properties of the rating scale of the DGI may reflect the consistency of the scale with typical clinical observations and language. The use of each of the rating scale categories was distributed normally. The middle categories "mild impairment" and "moderate impairment" were the most frequently used responses, and the 2 extreme categories "normal" and "severe impairment" were used the least. Furthermore, the use of each rating scale category was connected to person ability level. That is, as subject ability increased, there was a clear tendency for higher ratings to be used.
These findings challenge the suggestions of Krishnan et al48 that the DGI would be improved by expanding its rating scale categories by adding either extra timing components or time for completing tasks. They claimed that without mutually exclusive and exhaustive rating scales, an evaluator would have difficulty accurately assigning scores of 2 (mild impairment) and 1 (moderate impairment) because some people may demonstrate certain characteristics from more than one category. The DGI rating scale categories met or exceeded Linacres guidelines for optimizing rating scale category effectiveness.37 The present rating scale analysis suggests that evaluators had no difficulty in differentiating between these 2 ratings. The psychometric stability of the ratings may have resulted from the use of the universally accepted clinical terms and explicit definitions provided in the DGI. Terms such as "severe impairment," "moderate impairment," "mild impairment," and "normal" are rooted in clinical training and are used widely across a variety of clinical instruments. Furthermore, explicit definitions, such as "normal: performs head turns smoothly with no change in gait," provide clear guidance with which to grade a persons performance.
Hierarchical Order of the DGI Tasks
The results of Rasch analysis of the DGI revealed the underlying hierarchical order of item difficulty. "Gait with horizontal head turns," "steps," and "gait with vertical head turns" were the most difficult items, whereas "gait on level surface," "change in gait speed," and "step around obstacles" were the easiest items. The degree of sensory interference, novelty, and required effort may explain the item order demonstrated. The difficulty of the items "gait with horizontal head turns" and "gait with vertical head turns" may be attributed to vestibular influences and the novelty of the tasks. In addition, tasks such as "steps" (walk up stairs, at top turn around and walk down) may have been challenging because of musculoskeletal demands. In contrast, items that have fewer sensory demands and require less effort were shown to be the least difficult items, that is, "gait on level surface" and "change in gait speed."
The hierarchical structure of the DGI may have implications for modifying the current clinical administration of the DGI. At present, several of the most difficult tasks in the DGI—that is, "gait with horizontal head turns," "gait with vertical head turns," and "pivot turn"—are presented very early in the typical administration sequence, third, fourth, and fifth, respectively. Requiring people with severe impairments to perform these relatively challenging tasks early in the assessment may lead to frustration, insecurity, and safety concerns. In addition, asking people to perform easier tasks, such as "step over obstacle" and "step around obstacles," later in the assessment (sixth and seventh items administered in the DGI) deviates from the standard administration in which tasks progress from easy tasks to challenging tasks.
Information on the item difficulty hierarchy could lead to more dramatic administration modifications. For example, on the basis of the Rasch measurement model, a person who is capable of "climbing steps" will have a high probability of being successful at "walking on a level surface." The above scenario suggests that if a person is successful at "climbing steps," a challenging item, then it would be unnecessary to test the person on "walking on a level surface," an easier item. This "modern measurement" approach of selective item administration is commonly used in developmental testing49–51 and is the basis for computerized adaptive testing.52 The selective administration of items on the basis of ability could dramatically reduce the burden of testing on the individual and therapist time in test administration.53,54
Unidimensional Construct
The unidimensionality of the DGI is supported by both the fit statistics and the PCA.38 The infit and outfit values from overall person ability and item difficulty were both close to the ideal value of 1.0. Because of the low eigenvalue, the PCA further supports the integrity of the DGI for this sample. Often, in an effort to make an instrument all encompassing, multiple dimensions of a function or skill are combined. For example, the Functional Independence Measure combines motor and cognitive items.55,56 This combination can lead to challenges in making clear, clinical inferences. For example, improvement in Functional Independence Measure scores may be attributable to improvement in the motor construct, the cognitive construct, or both. In contrast, the unidimensionality reflected in the present form of the DGI will support interventions that focus on a single construct representing dynamic balance.
This investigation of the dimensionality of the DGI may provide some insight into the elemental components that comprise balance. Although the PCA eigenvalue was insufficient to support multiple constructs, the factor loadings suggest that a secondary construct may be embedded in the DGI. That is, all 3 items that have significant vestibular involvement (items 3, 4, and 5) had a tendency to load in directions opposite that of the remainder of the items. Tasks with vestibular involvement represent 3 of the 4 most difficult items, suggesting that with more challenging balance tasks, the multidimensionality of dynamic balance may emerge. Furthermore, it is possible with a larger number of subjects and less variance that the vestibular factor could form a separate construct.
Several limitations in this study may have influenced the psychometric findings presented. The subjects included were community-dwelling elderly people with identified balance deficits. Furthermore, the sample consisted solely of male veterans. The homogeneity of this sample may have favored the strong psychometric outcomes in this study.57,58 Replication of this study with a more diverse sample is warranted.
| Conclusion |
|---|
|
|
|---|
| Footnotes |
|---|
This research was presented as a poster presentation at the Combined Sections Meeting of the American Physical Therapy Association; February 12–16, 2003; Tampa, Fla.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
J. M. Hausdorff, A. Schweiger, T. Herman, G. Yogev-Seligmann, and N. Giladi Dual-Task Decrements in Gait: Contributing Factors Among Healthy Older Adults J Gerontol A Biol Sci Med Sci, December 1, 2008; 63(12): 1335 - 1343. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. L Walker, A. G Austin, G. M Banke, S. R Foxx, L. Gaetano, L. A Gardner, J. McElhiney, K. Morris, and L. Penn Reference Group Data for the Functional Gait Assessment Physical Therapy, November 1, 2007; 87(11): 1468 - 1477. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |