|
|
||||||||
Research Reports |
JL McGinley, PT, BAppSc (PT), is a doctoral student in the School of Physiotherapy, La Trobe University, Victoria, Australia, and Senior Research Physical Therapist, Geriatric Research Unit, Kingston Centre Rehabilitation and Aged Services Program, Southern Health, Melbourne, Australia.
PA Goldie, PT, PhD, is Associate Professor, School of Physiotherapy, La Trobe University
KM Greenwood, PhD, is Associate Professor, School of Psychological Science, La Trobe University
SJ Olney, PhD, is Professor and Director, School of Rehabilitation Therapy, and Associate Dean (Health Sciences), Queen's University, Kingston, Ontario, Canada
Address all correspondence to Ms McGinley at Geriatric Research Unit, Kingston Centre, Warrigal Rd, Cheltenham, Victoria, Australia 3192 (j.mcginley{at}latrobe.edu.au)
Submitted February 22, 2002;
Accepted September 2, 2002
| Abstract |
|---|
Key Words: Accuracy Gait Observational assessment Reliability Validity
| Introduction |
|---|
|
|
|---|
Despite its widespread use, there is a scarcity of evidence to support the accuracy (validity) of clinical observational analysis (Tab. 1). In 3 studies,35 observational accuracy was compared with various criterion measures and revealed mixed results. In a study of gait following stroke, Pearson product moment correlations (r) between observation of selected gait components and waveform indexes from a foot force measurement device varied from .32 to .72, with a mean correlation of .55.3 Clinicians observing prosthetic alignment detected only 22% of the deviations predicted by biomechanical analysis of gait.4 Observations of a mixed subject group were compared with kinematic angles obtained from simple protractor measurements of videotaped gait, with variable results.5 Although the accuracy of determining foot placement was high, observers were generally inaccurate at judging joint angles, with an average score of 1.2 (12) out of a maximum of 5. Results from these studies contrast with those of a more recent study of gait after stroke,6 which determined high Pearson correlations (r=.88) between physical therapists' judgments of timing symmetry and those concurrently obtained with an instrumented footswitch measurement device. Physical therapists also were highly accurate in judgments of other movements, including single-leg stance steadiness7 and upper-limb tasks after stroke.8,9 These are encouraging findings and suggest that therapists have the capacity to make accurate judgments under certain conditions.
|
A range of factors have been identified and described as potential influences contributing to the generally poor to moderate reliability of OGA measurements reported.2 These factors include the potential error associated with observation of live gait performances and the lack of both operational definitions and uniform observer instruction and training. Furthermore, the majority of studies of OGA have encompassed a wide variety of gait variables, thus requiring multiple and complex judgments. The low reliability found may relate to these design factors, rather than accurately reflecting the underlying agreement of the observers. Systematic studies that minimize variability due to live gait performances and reduce the number of observed variables provide insight into the capacity of therapists to reliably observe gait.8 Although the current study deviates from the role of OGA in clinical practice, it represents an initial step in the exploration of the potential accuracy and reliability of observation under optimized circumstances.
The selection of gait variables in observational studies also warrants careful attention.2,8 Variables selected should have clinical significance and be representative of gait capacity in the desired population.2 Examination of the variables previously studied reveals wide diversity in both the type and number of selected variables. Almost all of the variables were spatiotemporal or kinematic in nature and reflect those variables typically considered in OGA and appearing on common gait evaluation forms.1820 Although therapists have traditionally focused OGA on these gait features, a rapid expansion in measurement technology has allowed insight into other important aspects of gait. A review of current biomechanical knowledge of gait suggests strongly that kinetic variables also should be considered in any form of gait analysis.21
A comprehensive biomechanical description of the kinetic forces underlying walking in subjects without known impairments or pathology2123 and in subjects with hemiplegia24,25 has been provided by instrumented gait analysis. In walking, concentric and eccentric muscle contractions create moment forces across the lower-limb joints. Joint mechanical power is the product of the moment of force and the angular velocity across the joint.21 The visible kinematic gait pattern observed by the clinician is simply an outcome of these invisible kinetic forces. Knowledge of these underlying patterns of power generation and absorption, therefore, provides an explanation of any kinematic deviations that can be seen. Herbert et al26 have argued that clinical analysis of movement dysfunction requires more than a simple description of kinematic deviations. Rather, they suggested that therapists may be able to observe the visible features of gait and combine these observations with biomechanical knowledge to make inferences about the muscle forces that occur. This approach can provide insight into the nature of gait deficits and direct guidance toward appropriate intervention strategies.
A review of kinetic features of gait suggests that ankle power generation in late stance is an important gait variable that is impaired in subjects with gait dysfunction following a stroke.21,24,25 Ankle plantar-flexor muscles contract rapidly in late stance phase to provide the single largest burst of power generation in the gait cycle of adults without impairments.21 The magnitude of ankle power generated by individuals after stroke also is highly correlated with gait speed, a measure of gait performance for this population.24,27 Reduced ankle power also is associated with reduced peak knee flexion in the swing phase, which is assumed to be related to reduced walking efficiency and increased risk of tripping.25 This action in the late stance phase is commonly described as "push-off" and has been considered 1 of 6 critical features of human gait.28
To date, only 3 studies3,5,10 have included push-off as a component of OGA, and they had different methods and highly variable results. Goodkin and Diller10 reported 23 of 30 possible agreements across 3 therapists when observing subjects with hemiparesis. However, the absence of statistical analysis accounting for chance agreement limits conclusions from this study. Miyazaki and Kubota3 reported interobserver agreement of .63 among 4 observers who viewed 48 subjects with hemiparetic gait. A Pearson correlation of .59 also was obtained between the observations of push-off and the data from a foot-force measurement device. Although accuracy and reliability were low, the use of live rating sessions, multiple rating variables, and a limited ordinal scale may have adversely influenced the results. In marked contrast, kinesiology students achieved high reliability on a nominal scale (average agreement: 4.5 out of 5) when judging push-off in videotaped performances by 8 subjects with varied pathologies.5 However, because 2 subjects had amputations, it is not surprising that the observers were consistent in judging push-off as either normal or abnormal. No decisive evidence has emerged that indicates whether therapists can reliably or accurately use OGA to evaluate kinetic aspects of gait such as push-off. Further investigation with optimal methods, therefore, is warranted.
Some authors1,29 have proposed that OGA requires much practice, combined with an understanding of biomechanics. This reflects what we believe is a common assumption by physical therapists that superior OGA skills may be associated with experience and practice. The few researchers3033 who have examined the relationship between experience and reliability of data obtained with assessment techniques have not provided convincing evidence to support the assumption that experience has a positive influence on reliability. The only study14 of the relationship between therapist experience and reliability of OGA data demonstrated no consistent difference between experienced and less experienced therapists. In our research, therefore, we examined the relationship between the accuracy and reliability of therapists' observations and factors such as length and type of clinical experience.
In summary, the primary aim of this study was to investigate, after eliminating error due to variability in subject performance through the use of videotapes: (1) the accuracy, or criterion-related validity, of observations of push-off in videotaped gait performances of individuals after stroke, (2) the intraobserver and interobserver reliability of such observations, and (3) the relationship between clinical experience and the accuracy and reliability of OGA measurements.
| Development of a New Observational Rating Scale |
|---|
|
|
|---|
We believe the literature describing ankle power generation in adults without impairment also supports the inclusion of an 11-point (010) range for grading "normal" push-off. This range contrasts with the single category typically found in existing studies of OGA.3,5,10,12,14,15 A wider scale enables the full range of normal abilities to be incorporated, recognizing that it may be desirable to detect clinical change in an individual throughout the normal range of ability. Scales that allow discrimination within the normal range of performance values permit continued measurements beyond the lower values of the normal range. For example, normal gait speed for adults may be described as a range of values between 1.2 and 1.5 m/s.34 We believe that clinicians continue to measure gait speed beyond the lower threshold limit when evaluating change in an individual during gait rehabilitation after a stroke. Older subjects (±60 years of age) without impairments also have demonstrated a wide range of values for push-off power.21 Accordingly, a range of 0 to 10 was chosen for the normal scale to reflect the variability of ankle power generation within the population of people without impairments. Together, these two 11-point scales provide a model of ankle power generation as a continuum of human performance values (Fig. 1).
|
| Method |
|---|
|
|
|---|
=6.1, SD=3.0), with "neurology-specific" experience ranging from 0.8 to 11.0 years (
=4.8, SD=3.1). All therapists reported always using OGA when assessing gait after stroke. These therapists frequently assessed stroke gait, with 16 therapists assessing stroke gait daily.
Subjects after stroke.
Eleven subjects with hemiplegia following a single stroke were recruited using the subject database of the Human Motion Laboratory, Queens University, Kingston, Ontario, Canada. All subjects walked independently or with an assistive device or an orthosis. Eleven walking trials were selected from a wider database of 32 subjects with hemiparesis, each of whom completed 3 walks. The 11 walking trials were selected to provide a sample with the widest range and most even spread of ankle power generation values. Characteristics of the individuals and their gait variables are described in Table 2. All participants provided informed consent prior to the commencement of the study.
|
A digitizer (GTCO Datatizer
was used to extract the coordinates of the body and background markers from each cine film frame for each stride. After scaling and correction for parallax error, the coordinate data were digitally filtered using a second-order low-pass Butterworth recursive filter. These processed coordinates then were combined with anthropometric and forceplate data using biomechanical software adapted from Winter21 to determine the kinematic and kinetic variables. Anthropometric data in the biomechanical modeling were obtained by combining individual subject measurements with standard constants35 to estimate segment masses and inertias. The data collection process is further described in detail in Olney et al.24
Ankle joint power (Pa) was calculated as the product of the equation:
|
|
Construction of the gait performance videotape.
Videotapes for each of the 2 testing occasions were produced, with each videotape containing 11 randomly ordered images of subjects walking. Gait was viewed from a sagittal perspective closest to the affected side, as the subjects walked at their self-selected speed. Each walking trial was edited to include 4 repetitions of a single gait stride, from foot contact of the affected lower extremity through to early stance of the affected lower extremity in the subsequent stride. An audible tone was inserted at foot contact to alert the observers to watch the subsequent stance and push-off action. The total duration of the videotape was approximately 7 minutes, with individual subject performances varying from 12 to 40 seconds.
Testing procedure.
All observers participated in a 15-minute standardized instruction/familiarization session at the beginning of each testing occasion. This session included a brief review of normal ankle joint kinematics and kinetics21 and an operational definition of push-off. Push-off was defined as "a component of gait late in stance phase where the plantarflexors generate a concentric 'explosive' burst of energy, causing the foot to rapidly plantarflex" (adapted from Winter21). The rating scale was introduced, and the anchor descriptors were defined (Fig. 1). Observers practiced rating 2 example gait performances to become familiar with both the rating scale and the observational task. No specific training of OGA was provided, and no feedback was given regarding practice ratings, as this study aimed to investigate the observers' baseline ability.
Observers were tested in small groups at the rehabilitation centers on 2 occasions. The entire videotape was first shown to allow observation of the total range of subject variability. This aimed to minimize potential contraction bias effect.36 This form of bias occurs when observers use a range of responses smaller than the range of stimuli, such as clustering scores into the center of a scale. The videotape was then replayed and rated by the observers, who recorded their scores for each walk by simply circling a number on either the abnormal or normal scale (test 1). The videotape was played at normal speed to reflect the self-selected gait speed of each subject. Stopping, repeating, or slow-speed viewing of the videotape did not occur.
Observers attended an identical second testing session (test 2) approximately 4 weeks after the first testing session and viewed a second test videotape with a randomly altered order of image presentation. The 4-week time delay was used in an effort to maximize the possibility that the observers had forgotten their previous allocation of ratings. The altered order of subjects aimed to minimize the influence of a rating order effect. Observers also completed questionnaires about their clinical practice and experience.
Data Reduction and Analysis
Data available for analysis included both the observational rating scale data and the corresponding criterion measure data. The observational data comprised both categorical data (abnormal/normal) and ordinal data (the ratings). First, the ratings on the 2 scales were considered as parts of a single scale measuring push-off. This continuous scale extended from a "zero" point on the abnormal scale to 10 on the normal scale, with a total range of 22 points. Observers' ratings were converted to numbers on a single scale, with potential values of 0 to 21. In this manner, the values of any ratings of the 11-point (010) abnormal scale were unchanged, with a rating of 2 on the abnormal scale represented in the analysis as a score of 2. Ratings of 0 to 10 on the normal scale were converted to numbers 11 to 21, with a rating of 0 on the normal scale converted to a score of 11, a rating of 1 converted to a score of 12, and so on. In view of the interval construction and high number (22) of categories used to measure this continuous variable, parametric statistical methods were selected for data analysis involving the rating scale numbers. Second, data from the categorical judgments of normality were available to enable comparison with other studies of OGA that used categories of normal or abnormal in their scales These data were analyzed using nonparametric techniques.
Criterion-related validity: individual judgment models and accuracy.
The relationships between the recorded ratings and the associated criterion measure data were plotted for each observer, and we visually examined the plots to confirm the linearity of the relationships. Each scatterplot and regression equation, therefore, reflected a unique model of judgment.8 To investigate accuracy, individual Pearson product moment correlations were calculated using each observer's scores of push-off and the measurements of ankle power generation. These correlations were then transformed to Z scores using the Fisher r to Z transformation, averaged, then converted back to Pearson r indexes to provide a group mean correlation value for test 1.
The precision of the rating scale was investigated by calculating the standard error of estimating the peak ankle power generation (sest). This was determined for each of the observers, using the test 1 results. The sest was calculated, according to the following formula.37
|
|
These calculations are in the same units (watts per kilogram) as the criterion measure, thus providing a direct method of interpreting the accuracy of the observers' use of the 22-point rating scale. The sest quantifies the amount of judgment error (in watts per kilogram) for each observer using the rating scale.
The discrimination between normal and abnormal push-off.
The categorization of each walk as abnormal or normal was inspected in relation to the associated measurements of ankle power generation. This inspection provided us with insight into the judgment patterns of the observer group. Each of the 18 judgment models was explored to determine the range of measurements of ankle power generation at which observers divided the gait performances into each of the abnormal or normal categories. Regression equations from each judgment model enabled prediction of this division or cutoff point, which reflected the intersection point of the 2 scales. This point was determined in the following manner. A value of 10.5 was selected to substitute into each regression equation as the "x" rating value. This value represented a midline point that may arbitrarily separate the abnormal scale range of points (010) and the normal scale range of points (1121). From this "x" midline point, a corresponding "y" criterion measure value could be predicted, representing the threshold value of the criterion measure at which each observer's judgement model divided the gait performances as normal or abnormal. An example of this prediction process is illustrated in Figure 2. Values for these predicted thresholds of normality can be compared with the range of ankle power generation values attained by older subjects without impairments.
|
|
|
The SEM is provided in the same units as the 22-point observational rating scale, thus providing a direct method of interpreting the reliability of the scale. In this instance, the SEM reflects how much error there was across the observer group when rating the gait performances.
Interobserver reliability of the categorical data was investigated using the Cohen Kappa statistic. Kappa is a corrected measure of agreement between observers that takes into account the effect of chance agreement.40 A value of 1 represents perfect agreement, and a value of 0 indicates agreement that is no better than would be expected by chance. Kappa was calculated for each pair of observers for test 1. Kappa values were transformed to Z scores, averaged, and converted back to Kappa values to provide an average value.
Intraobserver reliability.
An ICC (2,1) also was calculated to determine the agreement consistency of each observer across the 2 test occasions. Individual ICC (2,1) values were determined for each observer and then averaged by the previously described method. The SEM was calculated for each clinician to reflect the error involved when a single observer repeated the rating on 2 occasions.39 The SEM for each observer then was calculated and averaged by the method described previously, using the standard deviation of the ratings of individual observers (averaged over test 1 and test 2) and the individual observers' ICC (2,1) values. Percentage of agreement and the Cohen Kappa statistic also were determined to evaluate the agreement of the categorical data.
Experience.
The relationship between the observers' length of experience and individual values of accuracy and intraobserver reliability was investigated by calculation of Pearson product moment correlation coefficients. Prior to calculating the correlation, the individual ICC (2,1) or Pearson product moment values were again transformed to Z scores, and the data were inspected to confirm a linear relationship. The relationship between experience and interobserver reliability also was investigated by selecting the 5 most experienced therapists (mean experience=8.3 years) and the 5 least experienced therapists (mean experience=2.8 years) and then comparing these 2 groups. Intraclass correlation coefficients (2,1) were obtained for the "most experienced" and "least experienced" groups by the method previously described, and confidence intervals around these values were determined.
| Results |
|---|
|
|
|---|
|
|
|
1.0 W/kg were rated by all observers as abnormal. At higher values, there was a progressive trend to rate performances as normal. This trend indicates a systematic scoring process relative to power generation.
|
Interobserver Reliability
Interobserver agreement was moderately high for both test occasions, with ICC (2,1) values ranging between .75 and .77 (Tab. 4). The average SEM (between all observers) was reasonably consistent across observers, at around 2.7 rating scale units.
|
Intraobserver Reliability
Observers were consistent in their individual use of the rating scale, with a mean ICC (2,1) of .89 obtained (Tab. 5). Variability was apparent, with ICC (2,1) values ranging from .64 to .96. The SEM also varied across observers, with a relatively low average of less than 2 rating scale units (Tab. 5). Consistency of the categorical judgment of normal versus abnormal across the 2 tests was similarly high, with average agreement of 89.5%. An averaged Kappa value of .79 for the observers (Tab. 5) indicated substantial to almost perfect agreement of categorical judgment.41
|
| Discussion |
|---|
|
|
|---|
The indexes of accuracy of the observational ratings in this study were high. We believe that this is an important result because clinicians report including push-off as a component of OGA when observing gait.42 The importance of the therapists' individual accuracy coefficients in relation to clinical practice is difficult to define. It cannot be stated with certainty what values of criterion-related validity are acceptable for such a clinical measure to be used with confidence. This study, therefore, provides evidence that therapists are able to accurately discriminate between levels of ankle power generation in gait after stroke, when observing videotaped gait performances.
These high correlations between the observational ratings and the criterion measure values are encouraging. We contend that further insight into observational accuracy can be achieved by examining the metric estimates of the accuracy of the therapists' ratings. These estimates provide an indication of whether therapists are able to detect what we would consider clinically meaningful change. The mean standard error of estimating the criterion measure values was found to be 0.51 W/kg, compared with the range of 3.16 W/kg in the sample. This means that in using the rating scale, therapists could be 68% confident that an individual rating score would fall plus or minus 0.51 W/kg from the true criterion value. A 95% CI would result from approximately 2 times this range, or plus or minus 1.02 W/kg. The mean error range of 1.02 W/kg is large, covering approximately one third of the range of the criterion measure. This error range implies that therapists would only be able to accurately discriminate changes of approximately 1 W/kg. Therapists frequently use measures to quantify changes in push-off across time, thereby evaluating a series of small changes at intervals during rehabilitation. Unfortunately, there are no group data for sequential measurements of ankle power generation after stroke, and the amount of change in ankle power generation that occurs during the rehabilitation phase is unknown. It is impossible, therefore, to know how precise or accurate therapists' observational measures need to be. Further research is needed to quantify the timing and amount of change in ankle power generation after stroke.
This typical error size of 0.51 W/kg can be considered in relation to the ability to discriminate between older people without known impairments and people following stroke. Olney et al24 reported the mean maximum ankle power generation of a group of 30 subjects following stroke as 0.60 W/kg (SD=0.51). This value is substantially different from the reported mean of 2.48 W/kg (SD=0.46) of elderly subjects without known impairments.21 This typical error of estimation of ankle power generation, in our opinion, would allow the use of observational judgments to differentiate among subjects with performance values near to the means of these groups.
Further insight into clinical judgment can be gained by consideration of the predicted normality values. Although therapists commonly decide whether a movement is normal, it is not known how this judgment is made. In the literature, it is common to find individual subject performance measures compared with an appropriate normative reference groups using a statistical framework. For example, gait speed may be considered abnormal when it is beyond 1.65 standard deviations of the mean of a reference group.43 Similarly, the measurements of ankle power generation following stroke in our study can be compared with those obtained from older subjects without impairments. The most appropriate reference group is that described by Winter,21 who collected gait data from 18 older subjects without impairments in an identical manner to that of the collection of gait data in our study. This provides what we consider the most valid comparable normative reference group. An estimate of the fifth percentile of this group's performance indicates that 95% of these subjects obtained ankle power generation values of 1.72 W/kg or greater. These values lie within the range of predicted normality values determined from the therapist judgment models and are remarkably close to the average of 1.8 W/kg. This finding indicates that, on average, the group of 18 therapists were placing the cutoff defining normality at around the fifth percentile mark.
Reliability of the Therapists' Observational Judgments
The indexes of agreement achieved in this study indicate that the therapists were able to make highly stable individual agreements about observation of a single kinetic gait variable on 2 occasions when viewing videotaped gait performances. These observers also demonstrated moderately consistent agreement with each other about the push-off ability of the subjects during gait.
In addition to measures of agreement, metric indexes of error are useful to provide a meaningful indication of the size of measurement error in units of the rating scale. The SEM (within a single observer) was relatively low, at around 1.8 units of the 22-point scale. This value allows estimation of the error size that could be expected when an observation is repeated by the same observer on another occasion. In this instance, a change of plus or minus 3.6 units would be required to be 95% confident that change had occurred beyond that potentially due to measurement error. The average SEM (among the group of observers) was larger, at around 2.5 units. This finding means that change of around 5 units would be necessary to demonstrate change beyond measurement error if an observation were repeated by a group of observers. These error values provide a framework based on measurement error against which true change can be evaluated. Because the magnitude of typical change in ankle power generation in recovery after stroke is not known, the clinical significance of these error sizes cannot easily be determined. However, relative to the 22-point scale, these error sizes appear to be small.
The higher reliability values for individual intraobserver reliability compared with interobserver reliability were expected. It is more likely that an individual observer will agree more consistently with himself or herself than with other therapists. This finding is consistent with previous investigations of OGA data reliability15,16 and other studies of movement observation.7,8 In current clinical practice, patients are commonly assigned to an individual therapist, who is responsible for their treatment for the duration of intervention. The findings of our study confirm that this OGA measurement is more consistent and has lower error when repeated by the same therapist, as compared with OGA measurements obtained by multiple therapists. This finding suggests that in clinical practice an individual therapist should endeavor to be responsible for OGA measures in a single patient across multiple sessions whenever possible. This practice should enhance reliability and minimize between-observer error.
Experience
The poor relationship between therapist experience and intraobserver reliability we demonstrated is comparable to that reported by Eastlack et al.14 Furthermore, we showed poor correlations between the obtained accuracy indexes and the experience of the therapists. More experienced therapists were not more accurate and did not obtain more reliable measurements than therapists with less experience. In addition, specialized neurology-specific experience did not enhance either reliability or accuracy. This finding contrasts with the belief that clinicians with more specialized experience are likely to be more accurate and to obtain more reliable measurements. These poor correlations strongly suggest that experience is not a factor that influences accuracy or reliability of measurements obtained for this rating task. We examined single observations only, and we did not examine the way in which these observations are interpreted or how the information is used. It is possible that expert-novice differences may lie more in the complex decisions arising from observations, rather than the observations themselves.
Methodological Considerations
The reliability and accuracy indexes obtained in this study were higher than the majority of those previously reported in the OGA literature (Tab. 1). Our higher values may relate to the method used and the design chosen. Provision of an operational definition, the selection of a single variable, and the simple scoring system allowing discriminating judgments are likely to have enhanced the accuracy of the observers and the reliability of their measurements. Our inclusion of these design features and the favorable results achieved support the premise that observers can be accurate and can obtain reliable measurements if the task is structured with specific guidelines.
The selection of a single videotaped stride as a rating stimulus may have enhanced the achieved reliability by reducing error due to patient variability. Live gait performance raises a strong question of performance inconsistency between gait trials, which is potentially accentuated by fatigue. The poor reliability reported by authors such as Miyazaki and Kubota3 and Goodkin and Diller10 thus may have been influenced by the inclusion of live rating sessions. Designs including videotaped performances have attempted to control performance variability, but reliability has not necessarily improved.5,12,14,16 Such studies, however, have included subjects who walked a number of strides across varied distances over a number of trials. In such instances, videotapes are likely to have controlled the variation in gait performance due to subject fatigue. However, the videotapes, in our opinion, may not have controlled variability due to stride-to-stride or trial-to-trial gait fluctuations, which may have contributed to increased error in rating scores when combined with inadequately defined rating instructions. For example, one observer may observe the best, worst, or a random stride performance for a single variable and then rate the stride performance accordingly. A second observer may seek to rate an average stride after viewing a number of strides. Thus, unless instructions are very specific, error in rating due to stimulus variability is likely and may result in lower reliability. All previous OGA studies have included a number of strides and thus may have been subject to this type of increased rating error.
The higher accuracy and reliability achieved with a wider 22-point scale in our study counters the contention that discriminative grading of OGA is very demanding and that use of broader scales may adversely influence reliability.2,5 Other studies of movement observation have described successful use of similarly discriminating scales, with comparable reliability outcomes.7,8 An expanded scale with clear descriptors may be easier to use than a narrow scale with descriptors relating to ill-defined categories of abnormality, such as "noticeably," "moderate," or "severe." In our study, observers were able to demonstrate discrimination of rating that is more comparable to the refined judgments of movement quality that may be necessary in clinical settings.
In nearly all studies of OGA, multiple variables were rated, with up to 50 variables considered by the observer.5 Furthermore, frequently cited OGA tools, such as the Ranchos Los Amigos Medical Center form,20 demand that multiple joints be assessed over the many phases of the gait cycle, with in excess of 100 decisions to be made. We contend that with a large number of variables to be judged, the greater are the attentional demands on the observer. As the number of demands, or attention required, increases, the potential for error and inconsistency may also increase. It is not known how observers rated multiple variables in previous reliability studies. In clinical practice, it is suggested that a systematic approach to observation should be adopted, with observers attending to single variables at a time.29,44 When observation was structured in this manner in Bernhardt and colleagues' study of 3 kinematic tasks, high accuracy and reliability were demonstrated.8 The results of our study and those of the study by Bernhardt et al8 suggest that it may be desirable for observers to attend to single variables sequentially when observing gait. Use of a videotape for analysis may augment the reliability of data obtained with this decision process. Indeed, videotaping of performance has frequently been advocated as a method to optimize movement analysis.1,29,44
Our study represents an initial step in the exploration of the accuracy of observations of a single variable in OGA by use of videotape. We believe the method we used enhanced observer accuracy, but also limits the immediate generalizability. The findings indicate that therapists are able to make accurate and reliable judgments of a single kinetic gait variable when viewing videotaped gait performances in a quiet environment. This finding confirms that therapists may be able to provide reliable and accurate decisions about observed movement when the task is clearly defined and focused on a single judgment and their attention is focused on preselected videotaped segments. The findings are limited to the context of this study and are not able to be immediately translated into clinical practice, where patients are observed directly and multiple variables are considered. These favorable results, however, establish that further research directed into clinical observation is appropriate. In view of the fundamental role of OGA within clinical decision making, it is important that further research define and direct the clinical utility of such observations.
Considerations for Future Research
The widespread continuing use of OGA to assess gait after stroke suggests that clinicians value information gained from these observations. Despite the emergence and emphasis of other measures such as gait speed, clinicians continue to observe their patients walking and focus on these "qualitative" aspects of gait. We therefore believe that research should focus on exploring and defining the potential value of such observations. Key questions for clinical practice have emerged from this and other studies of movement observation after stroke.8
| Conclusion |
|---|
|
|
|---|
| Footnotes |
|---|
This research was approved by the La Trobe University Faculty of Health Sciences Human Ethics Committee.
This research was presented at the Sixth International Physiotherapy Congress of the Australian Physiotherapy Association; May 2427, 2000; Canberra, Australian Capital Territory, Australia.
* Redlake Corp, 1711 Dell Ave, Campbell, CA 95008. ![]()
Advanced Medical Technology Inc, 141 California St, Newton, MA 02158. ![]()
GTCO Corp, 1055 First St, Rockville, MD 20850. ![]()
| References |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |