Background: Current clinical balance assessment tools do not aim to help therapists identify the underlying postural control systems responsible for poor functional balance. By identifying the disordered systems underlying balance control, therapists can direct specific types of intervention for different types of balance problems.

Objective: The goal of this study was to develop a clinical balance assessment tool that aims to target 6 different balance control systems so that specific rehabilitation approaches can be designed for different balance deficits. This article presents the theoretical framework, interrater reliability, and preliminary concurrent validity for this new instrument, the Balance Evaluation Systems Test (BESTest).

Design: The BESTest consists of 36 items, grouped into 6 systems: “Biomechanical Constraints,” “Stability Limits/Verticality,” “Anticipatory Postural Adjustments,” “Postural Responses,” “Sensory Orientation,” and “Stability in Gait.”

Methods: In 2 interrater trials, 22 subjects with and without balance disorders, ranging in age from 50 to 88 years, were rated concurrently on the BESTest by 19 therapists, students, and balance researchers. Concurrent validity was measured by correlation between the BESTest and balance confidence, as assessed with the Activities-specific Balance Confidence (ABC) Scale.

Results: Consistent with our theoretical framework, subjects with different diagnoses scored poorly on different sections of the BESTest. The intraclass correlation coefficient (ICC) for interrater reliability for the test as a whole was .91, with the 6 section ICCs ranging from .79 to .96. The Kendall coefficient of concordance among raters ranged from .46 to 1.00 for the 36 individual items. Concurrent validity of the correlation between the BESTest and the ABC Scale was r=.636, P<.01.

Limitations: Further testing is needed to determine whether: (1) the sections of the BESTest actually detect independent balance deficits, (2) other systems important for balance control should be added, and (3) a shorter version of the test is possible by eliminating redundant or insensitive items.

Conclusions: The BESTest is easy to learn to administer, with excellent reliability and very good validity. It is unique in allowing clinicians to determine the type of balance problems to direct specific treatments for their patients. By organizing clinical balance test items already in use, combined with new items not currently available, the BESTest is the most comprehensive clinical balance tool available and warrants further development.

Balance deficits are one of the most common problems treated by physical therapists. Therapists need to identify who has a balance problem and then decide the best approach to rehabilitation. Current standardized clinical balance assessment tools are directed at screening for balance problems and predicting fall risk, particularly in elderly people.17 These tools identify which patients may benefit from balance retraining, but they do not help therapists decide how to treat the underlying balance problems. Besides not being aimed at guiding treatment, the current balance assessment tools were developed specifically for older adults with balance problems. This article presents a new balance assessment tool developed to help physical therapists identify the underlying postural control systems that may be responsible for poor functional balance so that treatments can be directed specifically at the abnormal underlying systems.

Although many clinical tests are designed to test a single “balance system,” balance control is very complex and involves many different underlying systems.811 Whereas previous motor control models assumed postural control consisted of heirarchical righting and equilibrium reflexes, we wanted to develop a clinical test of balance control based on Bernstein's concept that postural control results from a set of interacting systems.1116 Consistent with this “systems model of motor control,” recent research in our laboratory and others has demonstrated how constraints, or deficits, in different underlying systems can impair balance.10,11,13,15,1720

Constraints on the biomechanical system, such as ankle or hip weakness and flexed postural alignment, limit the ability of frail elderly people and patients with Parkinson disease (PD) to use an ankle strategy or compensatory steps for postural recovery.21,22 Constraints on the limits of stability (that is, how far the body's center of mass can be moved over its base of support) and on verticality (that is, representation of gravitational upright), affected by sensory deficits or by stroke in the parietal cortex, may result in inflexible postural alignment or precarious body tilt.23,24

Constraints on anticipatory postural adjustments prior to voluntary movements depend on interaction of supplementary motor areas with the basal ganglia and brain-stem areas and result in instability during step initiation or during rapid arm movements while standing.25,26 Constraints on short, medium, and long proprioceptive feedback loops responsible for automatic postural responses to slips, trips, and pushes include late responses in patients with sensory neuropathy or multiple sclerosis, weak responses in patients with PD, and hypermetric responses in patients with cerebellar ataxia.2731

Constraints on sensory integration for spatial orientation result in disorientation and instability in patients with deficits in pathways involving the vestibular system and sensory integrative areas of the temporoparietal cortex when the support surface or visual environments are moving.27,32,33 Constraints on dynamic balance during gait result from impaired coordination between spinal locomotor and brain-stem postural sensorimotor programs when the falling body's center of mass must be caught by a changing base of foot support.34 In addition, cognitive constraints on executive or attentional systems can compound constraints in the other systems because each underlying neural control system for balance control requires cortical attention.12

Figure 1 shows the 6 interacting systems underlying control of balance that are targeted in our new Balance Evaluation Systems Test (BESTest). Each system consists of the neurophysiological mechanisms that control a particular aspect of postural control. Many of these systems are independent from each other in that different neural circuitry is involved, such that different pathologies may involve damage to different systems. For example, people with PD may have an abnormal system for stepping in response to an external perturbations but a normal sensory orientation system, which allows them to stand with eyes closed on an unstable surface by relying upon vestibular information.27,35 In contrast, people with loss of peripheral vestibular inputs may have abnormal sensory orientation with eyes closed on an unstable surface but normal postural responses to external perturbations.36,37 In current practice, computerized, dynamic posturography is based on the concept that the sensory orientation and postural motor reactions systems underlying balance can be separately measured and represent separate systems underlying control of balance.38 Thus, each patient with balance problems is likely to fall because of deficits in different underlying systems and may consequently fall in different environments and while performing different tasks. Therapists need to be able to differentiate the underlying systems’ contribution to balance problems and fall risk in their patients in order to appropriately direct intervention.

Figure 1.

Model summarizing systems underlying postural control corresponding to sections of the Balance Evaluation Systems Test (BESTest).

Table 1 summarizes the performance tasks grouped under each postural system for the BESTest. The entire BEStest with scoring, examiner, and patient instructions is presented in the eAppendix. The performance tasks are grouped to reveal function or dysfunction of a particular system underlying balance control (see reviews by Horak and colleagues811). Here, we briefly summarize the role of these systems in balance control and how each task item is related to its system:

Table 1.

Summary of Balance Evaluation Systems Test (BESTest) Items Under Each System Categorya

I. Biomechanical Constraints: Biomechanical constraints for standing balance include the quality of the base of foot support (item 1), geometric postural alignment (item 2), functional ankle and hip strength (force-generating capacity) for standing (items 3 and 4), and ability to rise from the floor to a standing position (item 5).39

II. Stability Limits/Verticality: This system includes items for an internal representation of how far the body can move over its base of support before changing the support or losing balance, as well as an internal perception of postural vertical.40,41 The ability to lean as far as possible in a sitting position with eyes closed (item 6) provides a measure of lateral limits of stability in a sitting posture, and the ability to realign the trunk and head back to perceived vertical (item 6) provides a measure of internal representation of gravity. The ability to reach maximally forward and laterally while standing (items 7 and 8) represents the functional limits of stability, although this may not necessarily be correlated with how far a person can lean the body's center of mass when not reaching.43,44

III. Anticipatory Postural Adjustments: This system includes tasks that require an active movement of the body's center of mass in anticipation of a postural transition from one body position to another. For example, we include the transitions from a sitting to a standing position45 (item 9), from normal stance to stance on toes45 (item 10), and from 2-legged- to 1-legged stance46 (item 11). Item 12 involves repetitive weight shifting from leg to leg in anticipation of tapping a forefoot on a stool, and item 13 involves anticipatory postural adjustments prior to rapid, bilateral arm raises with a weight.47,48

IV. Postural Responses: Reactive postural responses include both in-place and compensatory stepping responses to an external perturbation induced by the examiner's hands using the unique “push and release” technique.49 To induce an automatic postural response with the patient's feet in place (ankle or hip strategy), the tester pushes isometrically against either the front (item 14) or back (item 15) of the patient's shoulders until either the toes or the heels just begin to raise without changing the initial position of the body's center of mass over the feet before suddenly letting go of the push. To induce compensatory stepping responses, the tester requires a forward (item 16) or backward (item 17) or lateral (item 18) lean of the patient's center of mass over the base of foot support prior to release of pressure, requiring a fast, automatic step to recover equilibrium.49,50

V. Sensory Orientation: This system identifies any increase in body sway during stance associated with altering visual or surface somatosensory information for control of standing balance. Item 19 is the modified Clinical Test of Sensory Integration for Balance51 (CITSIB), and item 20 involves standing on a 10-degree, toes-up incline with eyes closed.

VI. Stability in Gait: This system includes evaluation of balance during gait (item 21) and when balance is challenged during gait by changing gait speed52 (item 22), by head rotations53 (item 23), by pivot turns (item 24), and by stepping over obstacles54 (item 25). This section also includes the Timed “Get Up & Go” Test, which evaluates how fast a patient can sequence rising from a chair, walking 3 m, turning, and sitting back down again without (item 26) and with (item 27) a secondary cognitive task to challenge the patient's attention.55

Although several separate neural systems underlie control of balance, each task may involve more than one system that interacts with others. For example, the task of tapping alternate feet onto a stair (item 12) is placed in the “Anticipatory Postural Adjustments” system because it requires adequate anticipatory postural weight shifting from one leg to the other. However, it also requires an adequate base of support and strength in the hip abductors (“Biomechanical Constraints” system). Interactions among systems can be seen by how a single pathology, such as abnormal vestibular function, will likely affect several tasks, such as the ability to stand on foam with eyes closed (item 19 in the “Sensory Orientation” system) and the ability to rotate the head while walking (item 23 in the “Stability in Gait” system). Future studies are needed to determine the extent to which postural system problems cluster, such that disorders in each system can be differentiated in the clinic.

The purpose of this article is to present the BESTest, with its theoretical framework and its first interrater reliability and concurrent validity analysis. This is the first step in maximizing the psychometric properties of this new balance assessment tool.


Development of the BESTest

The conceptual framework for developing a balance assessment tool that separates control of balance into its underlying systems is based on the scientific literature about laboratory measures of postural disorders in elderly people and in people with neurological disorders.811 The principle of having physical therapists evaluate 6 subcomponent systems underlying balance function initially was suggested as a qualitative assessment by Horak and Shumway-Cook in their continuing medical education courses between 1990 and 1999.1517,19,56,57 After Horak and Frank developed the BESTest, thousands of experienced physical therapy clinicians contributed to continued development of the BESTest by providing feedback about clarity, sensitivity, and practicality of items across 38 continuing education workshops delivered by Horak between 1999 and 2005. Following 2 days of didactic and observational training in the test, therapists in the workshops practiced performance of the test on each other and provided critical feedback to improve the clarity and specificity of instructions to patients and therapists. Some of the balance tasks in the test have been borrowed from current assessment tools, although they are now placed within our theoretical framework and the therapist and patient instructions, and most of the rating scales have been modified to improve consistency and reliability (Tab. 2). This is the first balance assessment tool to include a clinical method for assessing postural responses to external perturbations (section IV) and verticality (section II).

Table 2.

Balance Tasks in the Balance Evaluation Systems Test (BESTest) That Have Been Borrowed From Existing Clinical Testsa

The BESTest consists of 27 tasks, with some items consisting of 2 of 4 subitems (eg, for left and right sides), for a total of 36 items. Each item is scored on a 4-level, ordinal scale from 0 (worst performance) to 3 (best performance). Scores for the total test, as well as for each section, are provided as a percentage of total points. Specific patient and rating instructions and stopwatch and ruler values are used to improve reliability (see the eAppendix for the full test).

Session 1: Raters and Subjects

To evaluate the interrater reliability and internal consistency of the original version of the BESTest (current sections II–VI), we recruited 12 ambulatory adults with a wide range of balance function. Subjects were recruited as a sample of convenience from individuals who previously had participated in research studies on balance and postural control. No subjects had completed the BESTest prior to the first session. However, subjects may have completed specific items that were adapted from other clinical tests such as the Dynamic Gait Index. For this session, we included 3 subjects with PD, 5 subjects with vestibular dysfunction (3 with bilateral loss, 2 with unilateral loss), 1 subject with peripheral neuropathy and a total hip arthroplasty, and 3 subjects who were healthy (controls) (Tab. 3). All subjects met the following inclusion criteria: (1) ability to follow 3-step commands, (2) ability to provide informed consent, (3) ability to ambulate 6 m (20 ft) without human assistance, and (4) ability to tolerate the balance tasks without excessive fatigue. Subjects were provided short rest breaks as needed. The subjects (5 female, 7 male) ranged in age from 50 to 80 years (X̄=63, SD=10). Descriptive information for the subjects who completed the BESTest is listed in Table 3. None of the subjects used an assistive device during the testing.

View this table:
Table 3.

Descriptive Information on the Raters and Subjectsa

The 9 raters consisted of a convenience sample of 6 physical therapists from various practice settings and 3 Doctor of Physical Therapy students from Pacific University (mean age=33.1 years, SD=4.7; 3 male, 6 female; Tab. 3). Physical therapists were included if they had a valid Oregon physical therapist license, and physical therapist students were included if they had completed the relevant course work in relation to the evaluation and treatment of balance disorders.

Session 2: Raters and Subjects

After initial analysis of the first reliability data, a second testing session 18 months later evaluated the interrater reliability of a newly developed section I (“Biomechanical Constraints”) and a revised section VI (“Stability in Gait”). Section VI was revised due to a low intraclass correlation coefficient (ICC [2,1]=.54) obtained in the first session. The goals of this second testing session were to improve the reliability of section VI by modifying the criteria for scoring and requiring raters to view subjects from the front or back while walking and to add section I on biomechanical constraints affecting postural control. Testing session 2 involved 11 raters, including 3 raters from the first session (denoted by asterisks in Tab. 3). No students were included, although 2 raters were PhD researchers in human balance disorders without any physical therapy training or experience (Tab. 3). Eleven subjects, including 4 subjects from the first session, were administered 2 sections of the BESTest. As in the first session, subjects were a sample of convenience recruited from individuals who had previously participated in laboratory studies but who had no experience with the BESTest. Subjects in session 2 met the same inclusion criteria as in session 1. The subjects consisted of 6 subjects who were healthy (controls), 1 subject with unilateral vestibular loss, 1 subject with bilateral vestibular loss, 2 subjects with PD, and 1 subject with both peripheral neuropathy and bilateral hip arthroplasty. The subjects (5 female, 6 male) ranged in age from 67 to 88 years (X̄=75, SD=7.6).

The data and analysis from sections I through IV (current sections II–V) of session I and the new section I and revised section VI from session 2 are presented in this article. For both sessions, each subject completed an informed consent statement according to the Declaration of Helsinki.


All raters were provided with the BESTest and written instructions for administering the test approximately 1 week prior to the session. On the day of the study, the raters participated in a 45-minute training session with one of the developers of the BESTest (FBH). For training raters, each item of the BESTest was demonstrated on a subject who did not participate in the reliability study, and the rating criteria were discussed. The raters were allowed to ask questions regarding the scoring of the test. However, the raters were instructed to rate each outcome with no assistance or discussion with the other raters. The BESTest took 20 to 30 minutes to administer.

During the experimental sessions, the raters were asked to concurrently rate each of the subjects. In both sessions 1 and 2, one of the authors (FBH), who was not one of the raters, administered the BESTest once for each subject while the raters observed. The raters were allowed to position themselves around the area where the subjects were performing the test and to move about as needed in order to optimally view the subjects’ performance for recording the outcome. Only one opportunity was provided to view the performance of each test item. If a rater missed the performance of an item, the item was repeated (3 items for session 1 and 1 item for session 2), and all of the raters scored the second performance for consistency. Raters were provided with separate scoring sheets for each subject and did not discuss scoring among subjects. The raters were instructed to rate each outcome independently, with no assistance from or discussion with the other raters. The diagnoses of the subjects who completed the BESTest were masked from the raters.

To begin to describe concurrent validity, subjects completed the Activities-specific Balance Confidence (ABC) Scale.58 The ABC Scale quantifies how confident a person feels that he or she will not lose balance while performing 16 activities of daily living. The ABC Scale has demonstrated test-retest reliability (r=.92).59 Scores on the ABC Scale range from 0, indicating no confidence, to 100, indicating complete confidence in the person's ability to perform the task without losing balance. Scores on the ABC Scale have been correlated with ratings of older adults’ level of community function.60

Data Analysis

Interrater agreement for individual BESTest items was determined using the Kendall coefficient of concordance for ordinal data.61 Concurrent validity was assessed by analyzing the correlation of the BESTest total and subsection scores of the rater with the most exposure to the BESTest (DMW) with the ABC Scale scores using the Spearman correlation coefficient. Coefficients of .00 to .25 were interpreted to indicate little to no relationship, .25 to .50 as a fair relationship, .50 to .75 as a moderate to good relationship, and above .75 as a strong relationship.1,2,4,5,62 A Mann-Whitney U test was used on the ranking of BESTest total scores (of the rater with the most exposure to the BESTest) among subjects to determine whether the 3 controls scored better than the 7 subjects with balance problems.


Interrater Reliability

Interrater reliability statistics for BESTest total and subsection scores are presented in Table 4. The interrater reliability of the BESTest total scores was excellent, with an ICC (2,1) of .91. Subsection ICCs ranged from .79 to .96, and Kendall coefficients ranged from .79 to .95, indicating good to excellent reliability. Reliability statistics for individual BESTest items are presented in Figure 2. Individual items demonstrated Kendall coefficients ranging from .46 to 1.00. Items based on stopwatch time, such as items in section V (“Sensory Orientation”), tended to show the highest concordance, whereas judgments of alignment, ankle strength, and sitting limits of stability and verticality tended to show the lowest concordance. Only 3 items could not have concordance measured accurately because of limited variability among subjects (denoted by asterisks in Fig. 2). All raters scored all subjects as excellent (score of 3) on standing arm raise, and they scored the majority of subjects as excellent on the alternate stair touch (92%) and stance with eyes open (98%).

Figure 2.

Kendall coefficient of concordance scores for individual items of the Balance Evaluation Systems Test (BESTest). Error bars indicate 95% confidence interval. Asterisk indicates Kendall coefficient of concordance unable to be calculated accurately due to lack of variance in the data. EO=eyes open, EC=eyes closed.

Table 4.

Interrater Reliability Statistics for Balance Evaluation Systems Test (BESTest) Section and Total Scoresa

The ICCs for the BESTest total scores were .94 among the 3 students and .87 among the 6 therapists. The ICCs for each item within section VI (“Stability in Gait”) improved for the second interrater testing session compared with the first session, by instructing raters to view the subjects’ gait from the front or back rather than from the side. The ICCs for the BESTest total scores for items in section VI increased from .54 to .88, with the range of Kendall coefficients for individual items of .51 to .72 for the first interrater testing session increasing to a range of .62 to .90 for the second interrater testing session.

Test Performance

The subjects showed a wide range of variability on their performance of the test (Fig. 3). Figure 3 presents the median and interquartile ranges of BESTest total scores (expressed as percentages) across diagnostic categories. Median scores of all subjects ranged from 65% to 95%, with control subjects clustered at the high end and subjects with PD clustered at the low end. The Mann-Whitney U test showed that control subjects scored significantly higher (better) than the subjects with balance problems (P=.036).

Figure 3.

Median and interquartile range of Balance Evaluation Systems Test (BESTest) total scores across diagnostic categories for sections II through VI in testing session 1. Note the variation in scores among subjects tested. UVL=unilateral vestibular loss, BVL=bilateral vestibular loss, PD=Parkinson disease, PNP=peripheral neuropathy.

Consistent with our theoretical construct, the scores for each BESTest section by diagnostic subgroup (Tab. 5) show that the subjects with unilateral vestibular loss scored the worst in section V (“Sensory Orientation”) (60%), whereas the subjects with PD scored the worst in section IV (“Postural Responses”) (50%). The 1 subject with neuropathy scored the worst on section III (“Anticipatory Postural Adjustments”). Although this score was similar to that of the subjects with unilateral vestibular loss (67% versus 69%), the subject with neuropathy could be distinguished by a much higher score on section V (“Sensory Orientation”) (93% versus 60%) and a higher BESTest total score (79% versus 73%).

Table 5.

Percentage Score in Each Balance Evaluation Systems Test (BESTest) Section and Total Score in Session 1 by Diagnostic Group

The most difficult items for our subjects were: single-limb stance, stance on foam with eyes closed, Timed “Get Up & Go” Test with a cognitive task, walk with horizontal head turns, backward in-place postural responses, and standing hip strength. In one item (standing arm raise), all subjects had perfect scores; the other least-difficult items included stance with eyes open, alternate stair touch, sitting verticality and leans, and stance with eyes closed. Sorted by difficulty, the mean score (SD) and the frequency of how often a score was given for an individual item are summarized in Table 6. Variability among subjects and raters provided a wide range of scores across BESTest items.

Table 6.

Means, Standard Deviations, and Distribution of Balance Evaluation Systems Test (BESTest) Scores Within Each Balance Item Listed by Item Difficultya

Concurrent Validity With ABC Scale

The BESTest total scores correlated significantly with each subject's balance confidence, as measured by the subject's average ABC Scale score (r=.685, r2=.47, P<.05; Fig. 4). The ABC Scale scores demonstrated moderate correlation with the BESTest section scores of the BESTest (r=.41–.78). Section II (“Stability Limits/Verticality”) scores had the best correlation with the ABC Scale scores (r=.78), and Section III (“Anticipatory Postural Adjustments”) scores had the worst correlation (r=.41).

Figure 4.

Correlation between subjects’ Activities-specific Balance Confidence (ABC) Scale mean scores and their Balance Evaluation Systems Test (BESTest) total scores (from testing session 1 raters’ median scores).

Discussion and Conclusions

This study presents a new clinical balance assessment tool that is the first tool aimed at distinguishing the underlying systems that may be contributing to balance problems in individual patients. By distinguishing which systems underlying balance control are affected, this is the first clinical balance assessment tool to help direct rehabilitation of people with balance disorders. The most important contribution of the BESTest is to provide a conceptual framework around which to evaluate and treat patients with different types of balance problems.

Most existing clinical balance tests are directed at predicting fall risk or whether a balance problem exists, rather than what type of balance problem exists.16 Although these tests have proven valid in predicting the likelihood of future falls, with sensitivity and specificity values of 80% to 90%, the test results do not help therapists direct treatment.6365 Lord et al1 developed a different type of test, directed at identifying physiological impairments that could affect balance, such as impaired proprioception, visual function, or reaction time delays. Although the test is helpful for understanding the physiological reasons for balance problems, it is not apparent how to translate many of the impairments into specific balance exercise programs. Identification of impairments may help to identify the pathology, such as peripheral neuropathy or vestibular loss, that may be responsible for the balance problem. However, therapeutic exercise is not best designed based on pathology, because the functional ability of each patient is multifactorial and depends not only on the patient's pathology but also on the patient's compensation, experience, motivation, prior and concurrent pathologies, age, and so on.

It is especially critical, however, to stop conceptualizing balance as a single system so that treatment can be more specific than generalized “balance training” for a generalized “balance problem.” There is little evidence of carryover from learning one motor task to a different motor task, so practicing grapevine stepping in balance training is unlikely to improve functional limits of stability, postural responses to perturbations, or the ability to use vestibular information for balance. If a patient shows difficulty on a particular section of the BESTest, the therapist should not limit therapy to practicing the specific tasks that were difficult for the patient but should aim therapy at the underlying system deficit.66

If the BESTest is valid in supporting the conceptual framework that balance function can be divided into separate underlying systems, we would expect some patients to perform poorly in different subcategories compared with other patients. Even with our small sample of subjects, the 3 subjects with PD tended to perform poorly on items in section IV (“Postural Responses”), whereas the 3 subjects with vestibular loss performed poorly on items in section V (“Sensory Orientation”). Laboratory studies of postural responses and the ability to maintain equilibrium in stance under different sensory conditions in patients with PD, unilateral vestibular loss, or bilateral vestibular loss support these trends in our study.6769 In contrast to the subjects with PD and vestibular loss, the one subject with peripheral neuropathy combined with bilateral total hip arthroplasty scored worst on items in section III (“Anticipatory Postural Adjustments”). Based on these differential results, therapists would direct the patients with PD to practice compensatory stepping in response to perturbations,70 the patients with unilateral vestibular loss to practice balancing in conditions requiring use of the remaining vestibular information,66 and the patient with peripheral neuropathy to practice moving from one stable posture to another.62 Of course, other patients with these same pathologies may show different profiles in the BESTest, depending on their compensation strategies, which may affect their ability to overcome limitations from physiological constraints to perform a task using an alternative strategy.

Although the categories of systems in the BESTest were selected from current, scientific understanding of neurophysiological systems underlying postural control, the systems are quite interdependent. For example, constraints on the base of foot support (item 1) will necessarily affect the forward limits of postural stability in standing (item 7), and difficulty using vestibular information to stand on foam with eyes closed (item 19D) may make it difficult to perform head turns during gait (item 23). Furthermore, the tasks selected to reveal function of each of the 6 postural systems may not be ideal; some tasks are likely too easy to be discriminatory. For example, the standing arm raise to look for anticipatory postural adjustments (item 13) and stance with eyes open to examine postural sway (item 19) may only be sensitive in a laboratory, where surface reactive forces or body kinematics can be measured to detect physiologically significant, but not clinically apparent, changes in postural control. All of our subjects also scored a perfect 3 on alternate stair touch (item 12), adapted from the Berg Balance Scale,62 but this may be a problem with the excessively long time criteria (within 20 seconds) for doing only 8 steps, so we recommend increasing the number of steps to 15 in order to determine the number of steps completed per second. Further psychometric testing on large groups of patients with a variety of balance problems will reveal which items naturally group together and may suggest that some items should be moved or eliminated or altered, or even that a new system category should be added (ie, cognitive interference with balance performance).

With an ICC of .91 for BESTest total scores, the interrater reliability for the BESTest is excellent71 and just as good, or better than, the current, shorter balance assessment batteries (Berg Balance Scale: ICC=.9872; Tinetti Mobility Assessment: ICC=.75–1.042). Subsections of the BESTest adapted from established tests in the literature also show reliability similar to or better than that previously reported: Functional Reach Test ICC=.9873 compared with BESTest section II ICC=.79; CTSIB ICC=.7474 compared with BESTest section V ICC=.96; Dynamic Gait Index kappa=.642 and Timed “Get Up & Go” Test ICC=.997 compared with BESTest section VI ICC=.88. The interrater reliability of each section of the BESTest is sufficiently strong to allow therapists to use an individual section if they are short on time or want to direct a balance test at a specific postural system.75 An abbreviated test would be helpful because the BESTest takes about 30 minutes to carry out, even by an experienced therapist. Future studies are needed to identify redundant and insensitive items and to eliminate unnecessary items that do not add value to the test.

Inexperienced raters, without physical therapy experience, were able to learn how to score the BESTest with prior review and 45 minutes of instruction with demonstration. This unfamiliarity may have caused raters to be unsure of how to score a particular item or to make an error when recording a score. The reliability of Peabody Motor Developmental Scales-2 scores has been shown to increase as familiarity with the test increased.76 Because some of the items are novel and required specific hand positions and instructions, actual demonstration and training may be necessary for excellent interrater reliability, as well as for safety. Specifically, the push and release technique to elicit automatic postural responses by suddenly releasing the subjects’ leans requires observation and practice with at least video demonstration. Because the compensatory stepping postural responses necessarily required to move the body's center of mass beyond the limits of the base of foot support, these items also are the most dangerous to test in patients with balance disorders and, therefore, require special training. In some cases, subjects who are judged to be prone to a fall if attempting these items should automatically receive a score of 0 or not be tested in order to avoid a fall. Some scores, such as those for section IV (“Postural Responses”), may have been even more reliable if the raters also were physically performing the BESTest, although other scores, such as those for functional reach (items 7–9), were likely better because subjects could be viewed from a distance, without standby assistance for safety in our study. In this study, we found that it is important for raters to stand in front or in back of subjects, rather than parallel with them, while they are walking in items for section VI (“Stability in Gait”) in order to view potential lateral postural instability during gait. To improve reliability, we have since developed an educational DVD to train therapists how to administer and score the BESTest.*

The strong agreement between the BESTest total score and subjects’ rating of their balance confidence in the ABC Scale suggests that the BESTest measures aspects of balance functionally relevant to patients. The ABC Scale has been shown to be related to patients’ actual unwillingness to engage in activities in the community due fear of falling.59 However, treatments cannot be designed based solely on the ABC Scale, and a current study is investigating the relationship between the BESTest and the Berg Balance Scale and prospective falls in patients with a wide range of pathologies and abilities.


This study had several limitations. It is possible that other systems important for balance control are missing from the test and only the last item is related to cognitive constraints on balance, and this may be inadequate. Whether or not the sections of the BESTest accurately detect dissociable balance deficits remains to be investigated to establish its construct validity. How well section III (“Postural Responses”) and section IV (“Sensory Orientation”) are related to similar measures using computerized posturography is unknown. Sections I and II should be revised to improve their test-retest reliability. In addition, the test is quite long, such that future clinimetric studies need to identify redundant, insensitive items for a more efficient clinical tool. We also do not know how sensitive the BESTest is to change with intervention.

Further psychometric testing is warranted for the BESTest to establish its construct and concurrent validity, sensitivity and specificity, and ability to direct effective treatment for people with balance disorders. The scale is quantitative, and scoring is reproducible both for the test as a whole and for its subsections, as demonstrated by agreement among raters with varying experience. The BESTest appears to be testing functionally relevant aspects of balance control as seen by the agreement with subjects’ self-reported balance confidence. However, success of the BESTest will depend on how useful it is in assisting therapists to organize their systematic assessment of balance disorders to develop specific treatments based on each individual's balance constraints.


  • All authors provided concept/idea/project design and writing. Dr Horak and Dr Wrisley provided data collection and analysis, project management, fund procurement, and subjects. Dr Horak provided facilities/equipment, institutional liaisons, and clerical support. Dr Horak and Dr Frank provided consultation (including review of manuscript before submission).

  • The authors thank Larry Meyer and Trent Thompkins for collecting data on the first interrater reliability study as part of their Doctor of Physical Therapy thesis, as well as all of the subjects and raters who participated in this study. The authors also are indebted to the physical therapists who provided helpful criticisms of early versions of the test in continuing education workshops by Dr Horak. Statistical support from Dr George Knafl and Dawn Peters also is appreciated.

  • This work was supported by the National Institute on Aging grant R0-1 AG006457.

  • Poster presentations of this research were given at the Combined Sections Meetings of the American Physical Therapy Association; February 4–8, 2004; Nashville, Tennessee; and February 23–27, 2005; New Orleans, Louisiana.

  • * The BESTest Training DVD is distributed through Oregon Health and Science University's Technology and Research Collaborations Office and is available via a nonexclusive license. See:

  • Received March 10, 2008.
  • Accepted January 30, 2009.


View Abstract