Background and Purpose. To improve the utility of the Berg Balance Scale (BBS), the aim of this study was to develop a short form of the BBS (SFBBS) that was psychometrically similar (including test reliability, validity, and responsiveness) to the original BBS for people with stroke.
Subjects and Methods. A total of 226 subjects with stroke participated in this prospective study at 14 days after their stroke; 167 of these subjects also were examined at 90 days after their stroke. The BBS, Barthel Index, and Fugl-Meyer Motor Test were administered at these 2 time points. By reducing the number of tested items by more than half the number of items in the original BBS (ie, making 4-, 5-, 6-, and 7-item tests) and simplifying the scoring system of the original BBS (ie, collapsing the 5-level scale into a 3-level scale [BBS-3P]), we generated a total of 8 SFBBSs.
Results. The distributions of scores for all 8 SFBBSs were acceptable but featured notable floor effects. The 4-item BBS, 5-item BBS, 5-item BBS-3P, and 7-item BBS-3P demonstrated good reliability. The subjects’ scores on the 6-item BBS, 6-item BBS-3P, 7-item BBS, and 7-item BBS-3P showed excellent agreement with those on the original BBS. The 6-item BBS-3P and 7-item BBS-3P exhibited great responsiveness. Only the 7-item BBS-3P demonstrated both satisfactory and psychometric properties similar to those of the original BBS.
Discussion and Conclusion. The 7-item BBS-3P was found to be psychometrically similar to the original BBS. The 7-item BBS-3P, compared with the original BBS, is simpler and faster to complete in either a clinical or a research setting and is recommended. [Chou CY, Chien CW, Hsueh IP, et al. Developing a short form of the Berg Balance Scale for people with stroke.
A balance measure that is deemed useful in a clinical setting must be both psychometrically sound and not lengthy to administer.1–3 The Berg Balance Scale (BBS)4 has been used widely in order to evaluate balance performance for people with stroke.5–7 The BBS was previously shown to be psychometrically sound (including having high interrater reliability, high concurrent validity, and satisfactory responsiveness).5,6 However, 3 issues have been hampering the widespread utility of the BBS. First, the BBS may take about 20 minutes to complete8; such a procedure is quite time-consuming for daily clinical use and may place unreasonable demands upon respondents, especially in instances in which they may be seriously unwell, as in the case of people with stroke. Second, the BBS consists of 14 five-level items with scoring criteria varying from item to item. Such an inconsistency in scoring criteria could lead to difficulties for raters when making judgments about their patients’ conditions, especially for raters with less training. Third, the extremely high internal consistency of the BBS (the Cronbach α coefficient has been found to be as high as .98)6 indicates, to some extent, item redundancy. These observations suggest that the BBS needs to be simplified in order to improve its utility.
The simplification of a measure may include reducing the number of items or shortening the levels of scaling, or both.2,9–12 It has been revealed that certain measures simplified by one or both of these methods are psychometrically similar to the original measures.2,5,9,11 Therefore, the purpose of this study was to develop a short form of the BBS (SFBBS) that was psychometrically equivalent to the original BBS. We hypothesized that at least half of the items on the original BBS could be omitted and that the 5-level scaling could be reduced without sacrificing any psychometric properties. Thus, several SFBBSs are proposed here, and the psychometric properties of the SFBBSs were compared with those of the original BBS for a cohort of subjects who had had a stroke and who were evaluated from 14 days to 3 months after their stroke.
Data were retrieved from a prospective study (the Quality of Life After Stroke Study in Taiwan) initiated on December 1, 1999.6 For that study, subjects were recruited if they met the following criteria: diagnosis (clinical modification codes from the International Classification of Disease, 9th revision13) of cerebral hemorrhage (431), cerebral infarction (434), or other categories (430, 432, 433, 436, or 437); first onset of cerebrovascular accident without other major diseases; stroke onset within 14 days prior to hospital admission; ability to follow commands; and ability to provide informed consent personally or by proxy. Subjects were excluded if they had another stroke or other major disease(s) during the follow-up period or lived more than 64 km (40 miles) from the participating hospital.6
The BBS has 14 items, including 1 sitting item and 13 standing items.4,6 These items are based on a 5-level scale (0–4). Its total score ranges from 0 to 56. The BBS was originally developed to screen elderly people who are at risk for falling. The psychometric properties of the scale have been found to be satisfactory for people with stroke.5–7
A simplified BBS with a 3-level scale (BBS-3P)5 was developed by collapsing the second, third, and fourth levels of the original scale into a single level. This collapsed level was scored when subjects met the criteria for the original second or higher level of the scale but not when subjects met the criteria for the highest level of the scale. The BBS-3P was found to feature psychometric properties similar to those of the original BBS. Thus, in the present study, both the BBS and the BBS-3P were used in the development of short forms with shortened scaling. For use of the BBS-3P in this study, the data retrieved for this study were recoded as 0-2-4 by collapsing the 3 middle levels of the original 5-level scale.
The Barthel Index (BI) was developed to measure the severity of disability.14 The BI evaluates 10 basic activities of daily living items: feeding, transferring, grooming, toileting, bathing, ambulation, stair climbing, dressing, bowel control, and bladder control.13 The total possible score of the BI ranges from 0 to 100. The BI was previously shown to yield scores with good interrater reliability (intraclass correlation coefficient [ICC]=.94) and high convergent validity (Spearman ρ≥.92) for people with stroke.5,6,15,16 The BI was used to examine the convergent validity and predictive validity of data for the SFBBSs proposed in this study.
The Fugl-Meyer Motor Test (FM)17 has been used to measure motor impairment following stroke. The FM consists of 50 items of upper- and lower-extremity motor function. Each item is graded on a 3-level scale. Its total possible score ranges from 0 to 100 points, and it has been shown to yield data with good interrater reliability (ICC≥.92) and high concurrent validity (r≥.99) for people with stroke.5,18,19 The FM was used to test the convergent validity of data for the SFBBSs proposed in this study.
Subjects consecutively enrolled in the Quality of Life After Stroke Study were examined at 14 days after the onset of stroke and reassessed at other specific time points (eg, 90 days) after stroke onset for up to 3 years after the stroke to characterize their recovery of neurologic function (eg, as measured by the FM), balance ability (eg, as measured by the BBS), functional abilities (eg, as measured by the BI), and health-related quality of life. The measures used in this study (ie, the BBS, the FM, and the BI) were administered by an occupational therapist who was not informed of the purpose of this study. The interrater reliabilities for the raters administering the BBS and the BI were satisfactory, with ICCs of .95 and .94, respectively.6,15,16
Development of SFBBSs
In this study, the method used to develop and validate the SFBBSs mainly followed that proposed by Hobart and Thompson.2 These authors selected items featuring the highest internal consistency (ie, minimizing measurement error) and the greatest responsiveness (ie, maximizing the ability to detect change). Thus, this method would appear to be especially useful for developing a measure for monitoring recovery after stroke and measuring outcome after treatment and was adopted in this study. The data retrieved for this study were randomly divided into 2 groups: a calibration group for developing the SFBBSs and a validation group for comparing the psychometric properties of the various SFBBSs with those of the original BBS.
To develop the SFBBSs, the best items were determined by selecting the items with the lowest values from an overall item index of each item.2 The overall item index of each item is the product of the 2 rank orders (ie, the rank order of the corrected item total correlation for an item and the rank order of the effect size for an item). The corrected item total correlation for an item is the correlation between the scores of an individual item and the sum of the scores of all of the items on the scale minus that item. The rank of the corrected item total correlation is useful in removing test items that have a lower correlation with the overall construct measured in the BBS. Furthermore, the effect size for an item is the mean change score (14–90 days after stroke) divided by the standard deviation of the scores at 14 days after stroke. The rank of the effect size is useful in removing test items that show little sensitivity to change. Finally, the corrected item total correlation for each item and the effect size for each item were respectively ranked, and then the product of these rank orders was computed, that is, the overall item index of each item. For example, if the item total correlation rank of a given item is 1 and its effect size rank is 4, then its overall item index is 1×4=4. Lower values for the overall item index indicated better items.
We hypothesized that the use of 4 to 7 best items would be adequate for the SFBBSs. Four sets of SFBBSs were generated (ie, 4-item BBS, 5-item BBS, 6-item BBS, and 7-item BBS). We also used a technique to collapse the 3 levels in the middle of the BBS into a single level. Thus, we developed an additional 4 sets of SFBBSs (ie, 4-item BBS-3P, 5-item BBS-3P, 6-item BBS-3P, and 7-item BBS-3P). Therefore, a total of 8 SFBBSs were generated.
To compare the psychometric properties of the 8 SFBBSs and the original BBS, we linearly transformed the scores of the SFBBSs into the same score range as that for the original BBS (0–56). The psychometric properties tested in this study included acceptability, reliability, validity, and responsiveness.
Acceptability is a determination of whether the score distributions of a measure can match the distribution corresponding to the subjects intended to be measured.2 A measure exhibiting good acceptability should reveal observable scores spanning the entire range of the scale, with a mean score near the scale midpoint, and featuring small floor and ceiling effects, that is, less than 15% of the subjects achieving the lowest or the highest scores.2,20
Test reliability reflects the degree of precision of a measure; that is, high reliability requires a low rate of errors to be generated.21,22 To estimate test reliability, Hobart and Thompson2 recommended examination of the internal consistency of a specific test by use of Cronbach α coefficients to determine the intercorrelations among the items.2 It has been suggested that reliability estimations exceed .80 for group comparison studies and .95 for individual patient clinical decision making.2,21 Confidence intervals for the α coefficients were computed.2,23 Confidence intervals for individual scores for subjects with stroke were computed by calculating the standard error of measurement (SEM).21 The SEM indicates the spread of scores.24 The following 2 formulas were used: SEM=(standard deviation of sample scores)× ✓ (1−reliability) and 95% confidence intervals for individual scores=±1.96×SEM.
Test validity indicates whether a measure actually determines what it has been constructed to determine.2,25 We examined the agreement between the results of the SFBBSs and the results of the original BBS at 14 days after stroke by using a random-effects model ICC and the method proposed by Bland and Altman,26 which involves plotting the scores of the difference between the original BBS and the SFBBSs against those of the average between the original BBS and the SFBBSs.26 Ideally, there should be no trend showing systematic bias in a Bland-Altman plot.26 These results are useful for determining whether the SFBBSs and the original BBS can be used interchangeably.
In addition, 3 validity indicators were examined for the comparisons of the 8 SFBBSs and the original BBS. First, the concurrent validity at 14 days after stroke was examined by computing the intercorrelations between the scores of the SFBBSs and those of the original BBS. Second, the convergent validity for the scores of the SFBBSs, the FM, and the BI at 14 days after stroke also was examined. Third, the predictive validity of scores for the SFBBSs was determined by examining the relationships between the scores of the SFBBSs at 14 days after stroke and those of the BI at 90 days after stroke.
Responsiveness reflects the effectiveness of a measure in detecting changes in the longitudinal follow-up of the participants.27,28 The extent of the responsiveness of the SFBBSs was investigated by calculating effect sizes.22,25,29 Effect sizes were determined by computing the mean of the total score difference between 14 days and 90 days after stroke for each subject, divided by the standard deviation of the total score at 14 days after stroke.16 Larger values suggest greater responsiveness. Finally, we cross-validated the main psychometric properties of the best SFBBS found by using 20 samples that were randomly and repeatedly drawn from the full sample.
We examined 226 subjects at 14 days after stroke; 167 of these subjects were successfully examined at 90 days after stroke. The 226 subjects examined at day 14 were randomly divided into either a calibration group or a validation group, with each group consisting of 113 subjects. There was no significant difference between the ratios of male and female subjects for the calibration and validation groups, and the various scores for the BBS, BI, and FM proved to be very close for the 2 groups (Tab. 1).
Development of SFBBSs
Table 2 shows that the corrected item total correlations ranged from .72 to .96 and that the effect sizes ranged from .46 to .86 for individual items. According to the overall item index listed in Table 2, the 7-item BBS and 7-item BBS-3P were developed by including the 7 best items (in a hierarchical order): reaching forward with outstretched arm, standing with eyes closed, standing with one foot in front, turning to look behind, retrieving object from floor, standing on one foot, and sitting to standing. The 6-item, 5-item, and 4-item BBS and the BBS-3P were developed by sequentially removing the worst items from those 7 best items. Thus, a total of 8 SFBBSs were developed. Comprehensive evaluation of the psychometric properties of the 8 SFBBSs and the BBS revealed the following results.
All 8 SFBBSs investigated exhibited good variability, as the test scores spanned the full possible ranges of the scales. Mean scores (22.1–25.4) were slightly off the midpoint (28), and floor effects were notable (≥41.6% of the subjects) for the 8 SFBBSs (Tab. 3).
All 8 SFBBSs had very high α coefficients (≥.95), but only the 4-item BBS, 5-item BBS, 5-item BBS-3P, and 7-item BBS-3P had lower-limit confidence intervals that met the criterion of .80 (Tab. 3). The SEM of the 8 SFBBSs ranged from 3.6 to 4.7, values that were lower than 5.6 (ie, 10% the highest possible score of 56, such a score indicating clinical importance).30
The ICCs for the original BBS and SFBBSs were high (≥.96) (Tab. 3), indicating excellent agreement between the SFBBSs and the original BBS. The limits of agreement of the 6-item BBS, 6-item BBS-3P, 7-item BBS, and 7-item BBS-3P were about half those of the other SFBBSs, indicating that their scores for individual subjects were closer to the scores of the original BBS than to those of the other SFBBSs. Figures 1 and 2 show that only the 6-item BBS-3P and 7-item BBS-3P demonstrated no obvious systematic bias toward the BBS in the Bland-Altman plots (r2≤.04).
Table 4 shows that scores for all 8 SFBBSs demonstrated very high concurrent validity with scores for the original BBS (r≥.97). Moreover, scores for all of the SFBBSs exhibited equivalent and high convergent validity with scores for the BI (r=.84–.86) and with scores for the FM (r=.66–.68). The extent to which each of the 8 SFBBSs was able to predict the score of the BI at 90 days after stroke also was similar to that of the original BBS and satisfactory (r=.58–.60).
Table 4 shows that the 8 SFBBSs and the original BBS had similar and satisfactory effect sizes (.69–.85), especially the 6-item BBS-3P and 7-item BBS-3P, both of which had large effect sizes (≥.8). We found that the 7-item BBS-3P was slightly superior to the 6-item BBS-3P in acceptability, reliability, and validity (Tabs. 3 and 4). Only the 7-item BBS-3P met all of the predefined psychometric criteria, with the exception of the floor effects. Furthermore, the findings of this study also supported the requirement that the 7-item BBS-3P demonstrate satisfactory internal consistency, concurrent validity, and responsiveness relative to the original BBS for the 20 randomly reselected samples (Tab. 5).
To the best of our knowledge, this study is the first to develop an SFBBS with psychometric properties that are very similar to those of the original BBS. Simplifying the original BBS was achieved by comparing the psychometric properties of the original BBS with those of the 8 SFBBSs that were developed in this study. As a result, the 7-item BBS-3P did not appear to lose any psychometric properties compared with the original BBS; as a consequence, it is recommended for monitoring the recovery and measuring the outcome of patients with stroke.
Compared with the original BBS, the 7-item BBS-3P is improved in 3 significant aspects. First, the number of items is reduced by half. Second, the scoring levels are reduced from 5 to 3, thereby reducing the possibility of scoring inconsistency. Third, administration of the 7-item BBS-3P requires fewer assessment tools. For example, a stool was not necessary for the 7-item BBS-3P because of the removal of the item “placing alternate foot on stool.” All of these improvements allowed the raters to complete the SFBBS within half the time required to complete the original BBS (less than 10 of the original 20 minutes). This advantage of the 7-item BBS-3P decreases the possibility of incomplete data collection and contributes to efficiency in examination.
The use of the 7-item BBS-3P in clinical and research settings can be an improvement over the use of the original BBS given that the 7-item BBS-3P has excellent agreement with the original BBS. The Bland-Altman plot revealed that there was no notable trend between the difference and the average scores of the 7-item BBS-3P and the original BBS. Thus, the 7-item BBS-3P may be used interchangeably with the original BBS. The 7-item BBS-3P is especially useful when the time available for examination is short, such as at follow-up or when the clients are too weak to endure long examinations.
From the perspective of psychometric properties, up to 7 items (eg, standing unsupported and transferring) in the original BBS were found in our study to be redundant because their application did not provide any additional psychometric information. In earlier research, similar findings of item redundancy also were obtained for measures of some other domains, such as activities of daily living or quality of life.1,2,9,10,12,16 Therefore, it is worthwhile to explore in future studies whether there is any possibility of simplifying any of the other domains of conventional measures to decrease item redundancy in the measures and to promote the utility of clinical measures. However, from the clinical point of view, some important aspects of the balance performance of individual patients (eg, standing unsupported and transferring) are not recorded after the deletion of the items. Therefore, the 7-item BBS-3P may not be able to entirely replace the original BBS in the clinical setting, especially when the specific balance functions measured by the items deleted from the original measure are deemed to be treatment goals.
In this study, we used the method described by Hobart and Thompson2 to develop the 7-item BBS-3P. In that study, 6 of the first 7 items selected from the original BBS had been ranked as 1 (best) according to their corrected item total correlations, indicating that the corrected item total correlations were somewhat limited in discriminating the psychometric properties of the items of the BBS. Fortunately, this limitation did not interfere with the development of the 7-item BBS-3P. Future studies may add interrater reliability or test-retest reliability2 as supplementary criteria when too many items are ranked the same in the results for the corrected item total correlations.
A rather notable floor effect that was revealed for the 7-item BBS-3P also was found for the original BBS to a lesser extent. This notable floor effect may have resulted from the removal of the easiest item (unsupported sitting) from the 14 items of the BBS. Removing this item from the original BBS could reduce the ability of the 7-item BBS-3P to detect changes in sitting balance. As a result, the floor effect could weaken the ability of the 7-item BBS-3P to differentiate small balance function differences between people with severe stroke. Moreover, the presence of just such a floor effect may potentially damage the relative responsiveness of such a measure. However, we found the responsiveness of the 7-item BBS-3P to be satisfactory and very similar to that of the original BBS. Thus, the floor effect of the BBS-3P may not necessarily restrict the use of the 7-item BBS-3P for detecting balance improvement. From another point of view, the 7-item BBS-3P would benefit people who are able to attain or maintain upright stance without support, because testing easy tasks (eg, unsupported sitting) appears to be irrelevant for these people.
The psychometric properties of the 7-item BBS-3P were internally validated by use of 20 randomly reselected samples. The results of such validation testing provided strong evidence suggesting that the 7-item BBS-3P was psychometrically similar (including internal consistency, concurrent validity, and responsiveness) to the original BBS for people with stroke. Such results suggested that we did not “over fit” the results of the 7-item BBS-3P to this single data set and that the findings of this study were well supported.
The 7-item BBS-3P measure has sound psychometric properties and practical utility for use with people who have had a stroke. The 7-item BBS-3P, therefore, is suggested for use in people with stroke in both clinical and research settings.
Ms Chou, Mr Wang, and Dr Hsieh provided concept/idea/research design. Ms Chou, Mr Chien, and Ms Hsueh provided writing and data collection. Ms Hsueh and Dr Hsieh provided institutional liaisons, subjects, and project management. Mr Wang and Dr Hsieh provided fund procurement and clerical support. Dr Sheu and Dr Hsieh provided data analysis and consultation (including review of manuscript before submission).
This study was approved by an institutional review board of National Taiwan University Hospital.
This study was supported by a research grant from the National Science Council (NSC-90-2815-C-002-022-B), and National Health Research Institutes (NHRI-EX94–9204PP).
- Received December 14, 2004.
- Accepted July 18, 2005.
- Physical Therapy