|
|
||||||||
Research Reports |
M Blackburn, PT, MCSP, is Lecturer, Division of Physiotherapy Education, School of Community Health Sciences, University of Nottingham, Clinical Sciences Building, Hucknall Road, Nottingham, United Kingdom NG5 1PB (Marjan.Blackburn{at}Nottingham.ac.uk). She was Research Physiotherapist, Division of Stroke Medicine, School of Community Health Sciences, University of Nottingham, at the time of the study
P van Vliet, PhD, is Post Doctoral Research Physiotherapist, Centre for Vascular Research, School of Medical and Surgical Science, Division of Stroke Medicine, University of Nottingham
SP Mockett, MPhil, MCSP, is Lecturer and Deputy Head, Division of Physiotherapy Education, School of Community Health Sciences, University of Nottingham
Submitted December 8, 2000;
Accepted July 11, 2001
| Abstract |
|---|
Key Words: Cerebrovascular disorders Hemiplegia Hypertonicity Measurement Muscle tone Reliability
| Introduction |
|---|
|
|
|---|
Increased resistance to passive stretch (hypertonus) is a common motor disorder.3,4 Hypertonus has been assumed to be due primarily to an increased reflex response. However, Dietz et al5 presented evidence that altered mechanical properties of muscle may contribute to hypertonia in patients. Further evidence is provided by a study of 24 patients where hypertonia was associated with contracture but not with reflex hyperexcitability.6
Despite recent evidence suggesting that weakness may contribute more to disability than increased reflex responses in patients with stroke,79 management of increased muscle tone is still a major component of some rehabilitation protocols. Physical therapy texts3,4 and Kidd et al10 have advocated methods directed at normalizing these responses. It can be argued, therefore, that many therapists are interested in a tool that reliably measures reflex responses. Pederson11 and Katz and Rymer12 have reported the need for the development of scales that yield valid and reliable measurements to characterize increased reflex responses in order that the effectiveness of treatment techniques can be evaluated.
There are a number of different means of assessing muscle tone. Functional scales can contain a component for muscle tone (eg, the Fugl-Meyer Assessment13), although the connection between function and muscle tone is not well established.14 Additionally, electromyography, the Wartenburg/pendulum test,15 and electrophysiological tests16 are some measures used to characterize muscle tone. These measures, however, are not easily accessible and administrable by the clinicians, and other measures such as tendon jerks16 and rating scales10 (measures of behavior that involve an evaluation based on a checklist of criteria17) are more practical in clinical use.
Of these measures, we believe that rating scales are probably the most suitable for measuring muscle tone in the clinical setting. Scales with categories of "mild," "moderate," and "severe" increased reflex responses have been shown to result in unreliable measurements with large interrater disagreements.10 The Ashworth scale is a 5-point rating scale for measuring muscle tone (whether of mechanical or neural origin), with ratings from 0 ("no increase in tone") to 4 ("limb rigid in flexion or extension").18 It has been suggested that the Ashworth scale grade 0 could cover patients with "low tone" as well as "normal" muscle tone.19 Bohannon and Smith,1 in an early investigation of the reliability of measurements obtained using the Ashworth scale, found a clustering of scores at its lower end. In order to increase the sensitivity of the scale, they added an extra item to the lower end (grade 1+). Bohannon and Smith then retested the MAS for reliability on elbow flexor muscle tone in 30 patients with intracranial lesions and found that 2 raters agreed on 86.7% of the ratings. The Kendall tau correlation between the ratings was .847 (P<.001).1 The reliability of measurements obtained with the Ashworth scale and the MAS have been evaluated in a variety of patient groups.1,2023 In this article, we discuss those studies on patients with stroke, head injury, and multiple sclerosis.
Nuyens et al23 tested interrater reliability of using the Ashworth scale in patients with multiple sclerosis. The patients were tested bilaterally, and Kendall tau coefficients of .55 or lower were found for interrater reliability of measurements obtained for the adductor and medial (internal) rotator muscles of the hip. Reliability of these measurements was reflected by Kendall tau coefficients of .70 and .77 for the soleus muscle, .67 and .72 for the gastrocnemius muscle, and .86 and .71 for the psoas major muscle. Reliability for the quadriceps femoris muscles was reflected by Kendall tau coefficients of .63 (P=.0001) and .36 (P=.0297).
In patients with stroke or head injury, measurements obtained with the MAS have been shown to have good interrater reliability for the elbow (Kendall tau=.847, P<.001)1 and wrist (Kendall tau=.847, P<.001),20 but poorer reliability for the lower limb.23 Sloan et al,22 using the MAS, found a poor correlation (a Spearman coefficient of r=.37, P<.01) between 2 physicians acting as testers in their study in the lower limb for knee flexion without reinforcement (subjects were asked to clench their teeth when reinforcement was requested). Between the second physician and a physical therapist, coefficients of .32 with reinforcements and .26 without reinforcements were nonsignificant.
Pandyan et al24 suggested that the lower reliability that they found for the lower limb, compared with the upper limb, can be attributed to difficulties the examiners may have had in perceiving "reflex-mediated stiffness" when moving the heavier shank and foot segments or that it may be explained by the differences in the mass of the limb segments. Pandyan et al24 reported that testing procedures have not always been described in detail. Therefore, the reasons for reduced reliability across studies are difficult to determine. Better standardization of procedures might lead to improved reliability. Nuyens et al23 also contended that reliability might be improved with additional standardization.
The purpose of our study was to examine the intrarater and interrater reliability of measurements obtained with the MAS in the lower limb of patients with stroke using a standardized procedure.
| Method |
|---|
|
|
|---|
The exclusion criteria were having musculoskeletal conditions that prevented the test procedure from being carried out and not having the ability to understand simple instructions. Other researchers have either included1 or excluded20,23 subjects with normal tone. In our study, there was no a priori exclusion of subjects with normal muscle tone, because the purpose was to assess reliability of measurements of muscle tone, which includes determining whether normal muscle tone is present.
The subjects had a mean age of 76.1 years (SD=7.89, range=6290). Seventeen subjects were male, and 19 subjects were female. Eighteen subjects had right hemiplegia, and 18 subjects had left hemiplegia. As defined by the Bamford Scale,25 1 subject had a total anterior circulation infarct, 23 subjects had a partial anterior circulation infarct, 6 subjects had a lacunar circulation infarct, 3 subjects had a posterior occipital circulation infarct, and 3 subjects were unclassified.
Procedure
Twenty subjects were tested 2 weeks following their stroke, and 20 subjects were tested after 12 weeks. Four of the subjects tested at 2 weeks were also tested at 12 weeks, due to limited availability of subjects. Thus, the total number of subjects in the trial was 36. Two testing times were included to increase the likelihood of recruiting subjects with both high and low muscle tone and not to assess reliability between 2 occasions 10 weeks apart. Subjects recruited at 12 weeks post-stroke were expected to have higher levels of muscle tone.
A researcher (MB) and an independent examiner examined each subject. Both testers were physical therapists, each with more than 10 years of experience in handling the limbs of people with stroke. As in previous studies,19,20,23 no extensive training with regard to the procedure was done. Bohannon and Smith,1 Lee et al,21 and Sloan et al22 used a period of training. Although this training, theoretically, might have improved the reliability of the measurements, we wanted the procedure used in this study to resemble how the scale would be used in clinical practice. After consultation with a group of local hospital and community-based physical therapists, we came to the conclusion that usual clinical practice could accommodate written guidelines, but was unlikely to involve extensive training.
The level of muscle tone can be affected by the emotional status (fear, anxiety, or apprehension) and general unwellness of the subject, the environment, the temperature, and fatigue.26 Therefore, for the interrater reliability component of the study, tests were performed 1 hour apart. A 1-hour interval was considered long enough to allow for the testing effect of the first test to disappear and short enough to prevent major changes in the subject or environment that may affect muscle tone. The order of testing by the 2 examiners was reversed for half of the subjects to control for order effects, such as familiarity with the examiner and the testing procedure. For the intrarater reliability component of the study, the researcher (MB) repeated the test 1 week later. Although muscle tone may change over this period, it may also change from day to day, and there is no evidence to suggest that the change will be consistently in one direction or the other. We believed that a week was sufficient time to prevent exact recall of the initial grade by the examiner.
The subjects tested at 2 weeks were tested on the ward. At 12 weeks, subjects were tested either on the ward or in the place to which they were discharged.
The muscles tested were the gastrocnemius, soleus, and the quadriceps femoris group on the hemiplegic side. They were chosen as lower-limb muscle groups important for rehabilitation and because they are said to be among the most common muscles affected by increased muscle tone.3,6 Previous investigations of using the Ashworth scale to measure the calf muscles either have taken the approach of grouping the soleus and gastrocnemius muscles together as the plantar flexors27 or have tested these muscles separately,23 but the investigators did not explain their reasons for each approach. It is common practice when testing the extensibility of these 2 muscles28 to test them separately because of their different anatomical attachments. The soleus muscle crosses only one joint (talocrural), whereas the gastrocnemius muscle crosses 2 joints (talocrural and knee). The changes in passive muscle stiffness that are associated with disuse29 and that can occur following a stroke are likely to affect each of these muscles differently because of their different attachments and depending on their pattern of use after stroke. Because the resistance felt when testing muscle tone derives partly from passive muscle properties, we reasoned that we should attempt to test the soleus and gastrocnemius muscles separately. At the time of conducting this study, we could find no evidence for or against the idea that the reflex response may be different for the 2 muscles. The positions described in Table 1 were based on testing positions for extensibility of the 2 muscles,28 but evidence indicating that they can be differentiated during any form of testing is lacking.
|
The standardized procedure was as follows. Each subject was put in a resting position for 5 minutes, with socks and shoes removed. The handling and positioning of the subject's limbs by the tester are described in Table 1.
Each test movement was performed over a duration of about 1 second (by counting "one thousand one"), as described by Bohannon and Smith.1 The movement was repeated 3 times because once may not be sufficient for a rater to attribute a score.23 After performing the 3 test movements, the tester graded the resistance felt, with a single score, according to the MAS1 as described in Table 2.
|
Data Analysis
Scores for each subject from both testers were assembled in tables of agreement for each muscle and for the combined muscle groups. The scores obtained during the 2- and 12-week tests were pooled for analysis. The numbers of assignments and percentage of agreement for each muscle were calculated.
Because the MAS is an ordinal level measure of resistance to passive movement,24 reliability was tested statistically with the Kendall tau-b. This test allowed for comparison of our results with those of other studies.1,20,27 The Kendall tau-b statistic is a nonprobabilistic estimate. Kappa coefficients were not calculated because the prevalence within each of the categories differed greatly and Altman30 suggested that it is an inappropriate statistic to use under such circumstances.
Because of the high frequency of scores of 0 obtained, the data were skewed. Therefore, a further analysis was performed, excluding all of the test movements on which both examiners (for interrater reliability) or one examiner on both occasions (for intrarater reliability) obtained scores of 0, to more closely examine the other categories of the scale. Statistical calculations were performed with the software package SPSS for Windows, release 6.0.*
| Results |
|---|
|
|
|---|
|
|
|
|
|
|
When percentages of agreement for scores of 0 were excluded (Tab. 8) for the interrater reliability test, 71 cases from 120 remained, and there was considerably less agreement between examiners. When percentages of agreement for scores of 0 were excluded (Tab. 8) for the intrarater reliability test, 48 cases from 120 remained, and there was again considerably less agreement between examiners than when percentages of agreement for scores of 0 were included.
| Discussion |
|---|
|
|
|---|
Because people with stroke might be expected to develop increased tone gradually and mechanical muscle properties will change over time, we had expected the inclusion of subjects at 12 weeks would have allowed the whole range of points on the scale to be represented. However, this was not the case, as a substantial number of assignments were scored as 0.
Because most agreement was found for assigned scores of 0 (n=121), it appears that the MAS, when used with this standardized procedure, can yield reliable measurements to establish whether normal or low muscle tone is present. The remaining 119 assigned grades represent a substantial number of opportunities of assessing agreement on the higher grades of the scale. Of these, only 21 were in agreement, 17 of which were at grades 1 and 1+. There were very few scores in grades 3 and 4, which has been found in other studies and therefore seems typical of people with stroke.1,20,27
The poor agreement on grades 1, 1+, and 2 in our study is comparable to the results of studies by Bohannon and Smith1 and Bodin and Morris,20 who also found noticeably poor agreement on the 1+ grade and slight disagreement on grade 2. In a review by Pandyan et al,24 it was noted that much of the reduction of reliability of measurements obtained with the MAS appears to center on the disagreements at the lower end of the scale (ie, between the grades of 1 and 1+). Pandyan et al24 suggested that the lower reliability observed when using the MAS, as compared with using the Ashworth scale, could be attributed to the extra level of classification, which has increased the probability of error. The descriptors for grades 1 and 1+ are different mainly in terms of the range of movement over which resistance is felt (see testing criteria in Tab. 2). Pandyan et al24 suggested that the resting limb posture before stretch should be standardized to ensure that the limb is moved throughout the same range during each testing occasion. Our procedure specified the start position; however, the disagreement on grades 1 and 1+ remained.
Both testers in our study were physical therapists who were experienced in handling the limbs of people with stroke and were familiar with, though did not regularly use, the MAS. In order to make testing conditions similar to the conditions of clinical practice, the therapists did not undergo extensive training in the use of the scale. Other researchers1,27 ensured that their testers had extensive practice prior to their studies, and they consequently found good interrater reliability (Bohannon and Smith1 (Kendall tau=.847 at P<.001 in the study by Bohannon and Smith1 and Kendall tau=.647 at P<.05 in the study by Allison et al27). This finding indicates that training prior to testing can provide greater reliability and that a standardized procedure in itself is not enough.
The standardized procedure included written instructions regarding a 5-minute rest period prior to testing and specified positions for both subject and tester in addition to the guidelines used by Bohannon and Smith.1 Although this procedure was sufficient to ensure good agreement within one examiner on scores obtained for the grade of 0, it was insufficient to ensure acceptable intrarater reliability on other grades or interrater reliability on any grade. Other reasons for poor interrater reliability could be the number of movements performed to establish the grade, the method of scoring, and extraneous factors.
The examiners in our study commented that testing each muscle 3 times, as in the study by Bohannon and Smith,1 was not always enough to establish the appropriate grade. However, more testing may alter the muscle tone. Pandyan et al24 recommended that repeated movements should be kept to a minimum. Variations existed in this part of the procedure in previous studies. Lee et al21 measured each muscle group 5 times, and Sloan et al22 measured each movement 4 times.
Variations also existed in the scoring methods used in previous studies. Lee et al21 recorded the lowest score. Sloan et al22 gave individual scores for each movement. Lee at al21 and Nuyens et al23 summed individual muscle scores, but Pandyan et al24 discouraged this practice, arguing that it will mask any unreliability arising with the use of individual muscle scores.
We recognize that muscle tone can fluctuate due to extraneous factors such as anxiety, depression, fatigue, the ambient temperature, the presence of concurrent urinary tract infections or constipation, and the use of drugs.26 These extraneous factors might have been another reason for our poor interrater reliability, but this explanation seems unlikely because the tests were done only 1 hour apart. The unstable nature of the reflex response is problematic when considering the reliability of measurements obtained with procedures designed to measure this response. In the design of a research study such as ours, several strategies could be incorporated to improve the stability of the measure. One idea is to ensure that there is agreement between raters at the start of the study by training, but we have stated why we did not choose this strategy. The stability of the population studied could be improved, for example, by selecting a group of subjects with chronic stroke. This method of subject selection, however, would limit the relevance of the results to clinical practice, as the scale is also used for people with acute stroke. In addition, the number of raters could be increased. This method would have the advantage of increasing the number of comparisons between raters. The number of measurements made could also be increased by increasing the number of subjects. The number of subjects in our study was greater than in other studies,1,20,23,27 and in all of these studies, there were only 2 raters. We would recommend increasing both number of subjects and the number of raters in future, similar studies.
Of the muscles tested, the reliability of scores obtained for the quadriceps femoris muscle was highest, both for interrater and intrarater reliability. No other study has tested this muscle group using the MAS in subjects with stroke, so it is not possible to compare this result with the results of other studies.
The 2 separate testing positions for the soleus and gastrocnemius muscles used in our study resulted in different reliability values. This finding suggests that altered mechanical properties of muscle should be given more consideration in testing muscle tone and that 1- and 2-joint muscles performing the same joint action should be tested separately, as has been done in the our study and in the study by Nuyens et al.23
Our study, in common with others, could not adequately test the reliability of scores obtained using the higher grades of the MAS due to the infrequency of the occurrence of grades 3 and 4. Further investigation of this aspect is needed by deliberate selection of subjects with moderate to severe increased reflex responses in a future study.
| Conclusion |
|---|
|
|
|---|
Comparison of the results of our study with the results of other studies suggests that reliability of measurements obtained using the MAS is greatly improved by extensive training of test users. Training may be required of all test users in a clinical setting to ensure adequate reliability between testers. Further research into the reliability of measurements obtained with the MAS for the lower limb in people with stroke is indicated to assess whether combining a standardized procedure with extensive training can improve interrater reliability for this patient group. In addition, studies with greater numbers of examiners are needed.
| Footnotes |
|---|
Ethical approval was provided by the Nottingham City Hospital.
This research was supported by the Hospital Saving Association.
The main findings were presented at the Physiotherapy Research Conference; Leeds, West Yorkshire, England; April 15, 1999.
* SPSS Inc, 444 N Michigan Ave, Chicago, IL 60611. ![]()
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M M Mirbagheri, C Tsao, and W Z Rymer Natural history of neuromuscular properties after stroke: a longitudinal study J. Neurol. Neurosurg. Psychiatry, November 1, 2009; 80(11): 1212 - 1217. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. R Pierce, M. F Barbe, A. E Barr, P. A Shewokis, and R. T Lauer Roles of Reflex Activity and Co-contraction During Assessments of Spasticity of the Knee Flexor and Knee Extensor Muscles in Children With Cerebral Palsy and Different Functional Levels Physical Therapy, October 1, 2008; 88(10): 1124 - 1134. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. H Bakhtiary and E. Fatemy Does electrical stimulation reduce spasticity after stroke? A randomized controlled study Clinical Rehabilitation, May 1, 2008; 22(5): 418 - 425. [Abstract] [PDF] |
||||
![]() |
T. D. Sanger, A. Bastian, J. Brunstrom, D. Damiano, M. Delgado, L. Dure, D. Gaebler-Spira, A. Hoon, J. W. Mink, S. Sherman-Levine, et al. Prospective Open-Label Clinical Trial of Trihexyphenidyl in Children With Secondary Dystonia due to Cerebral Palsy J Child Neurol, May 1, 2007; 22(5): 530 - 537. [Abstract] [PDF] |
||||
![]() |
T. D. Sanger, S. N. Kukke, and S. Sherman-Levine Botulinum Toxin Type B Improves the Speed of Reaching in Children With Cerebral Palsy and Arm Dystonia: An Open-Label, Dose-Escalation Pilot Study J Child Neurol, January 1, 2007; 22(1): 116 - 122. [Abstract] [PDF] |
||||
![]() |
W. K. L. Yam and M. S. M. Leung Interrater Reliability of Modified Ashworth Scale and Modified Tardieu Scale in Children With Spastic Cerebral Palsy J Child Neurol, December 1, 2006; 21(12): 1031 - 1035. [Abstract] [PDF] |
||||
![]() |
E. Rydwik, S. Eliasson, and G. Akner The effect of exercise of the affected foot in stroke patients-a randomized controlled pilot trial Clinical Rehabilitation, August 1, 2006; 20(8): 645 - 655. [Abstract] [PDF] |
||||
![]() |
R. T. S. Kumar, A. D. Pandyan, and A. K. Sharma Biomechanical measurement of post-stroke spasticity Age Ageing, July 1, 2006; 35(4): 371 - 375. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Mehrholz, K. Wagner, D. Meissner, K. Grundmann, C. Zange, R. Koch, and M. Pohl Reliability of the Modified Tardieu Scale and the Modified Ashworth Scale in adult patients with severe brain injury: a comparison study Clinical Rehabilitation, July 1, 2005; 19(7): 751 - 759. [Abstract] [PDF] |
||||
![]() |
U.-B. Flansbjer, A. M. Holmback, D. Downham, and J. Lexell What change in isokinetic knee muscle strength can be detected in men and women with hemiparesis after stroke? Clinical Rehabilitation, May 1, 2005; 19(5): 514 - 522. [Abstract] [PDF] |
||||
![]() |
J. Mehrholz, Y. Major, D. Meissner, S. Sandi-Gahun, R. Koch, and M. Pohl The influence of contractures and variation in measurement stretching velocity on the reliability of the Modified Ashworth Scale in patients with severe brain injury Clinical Rehabilitation, January 1, 2005; 19(1): 63 - 72. [Abstract] [PDF] |
||||
![]() |
D. D. Aarrestad, M. D. Williams, S. C. Fehrer, E. Mikhailenok, and C. T. Leonard Intra- and Interrater Reliabilities of the Myotonometer When Assessing the Spastic Condition of Children With Cerebral Palsy J Child Neurol, November 1, 2004; 19(11): 894 - 901. [Abstract] [PDF] |
||||
![]() |
S. Blanton, S. P Grissom, and L. Riolo Use of a Static Adjustable Ankle-Foot Orthosis Following Tibial Nerve Block to Reduce Plantar-Flexion Contracture in an Individual With Brain Injury Physical Therapy, November 1, 2002; 82(11): 1087 - 1097. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. S Tappan Rehabilitation for Balance and Ambulation in a Patient With Attention Impairment Due to Intracranial Hemorrhage Physical Therapy, May 1, 2002; 82(5): 473 - 484. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |