PTJ
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


PHYS THER
Vol. 88, No. 7, July 2008, pp. 854-856
DOI: 10.2522/ptj.20070211.ar

This Article
Right arrow Extract Freely available
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Rapid Responses are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Geyh, S.
Right arrow Articles by Cieza, A.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Geyh, S.
Right arrow Articles by Cieza, A.
Related Collections
Right arrow Neurology/Neuromuscular System: Other
Right arrow Tests and Measurements
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Research Reports

Author Response

Szilvia Geyh, Gerold Stucki and Alarcos Cieza


In his highly informative, cutting-edge commentary1 on our article,2 Jette takes up the theme of interrater reliability in the application of the Extended ICF Core Set for Stroke to point out and discuss the challenges of operationalization and measurement in relationship to the implementation of the World Health Organization's International Classification of Functioning, Disability and Health (ICF).3 Jette presents and explains with deep insight important facts about the ICF, the Extended ICF Core Set for Stroke, and the concept of the ICF qualifiers. He summarizes the value of the ICF and the ICF Core Sets and their different fields of use. In addition, he emphasizes the relevance of the ICF framework within and beyond the field of physical therapy.

In turning to our current study and the question of interrater reliability in the context of ICF implementation, Jette highlights the need for meeting methodological standards in measurement: "the ICF classification system must meet key methodological standards, one of which is acceptable reliability of the ratings when performed by different raters." He concludes that our study "raises serious questions about the methodological adequacy of the ICF Core Set ratings" and that the "methodological concerns appear inherent in the approach" of the ICF and ICF Core Sets.

We agree with Jette that moderate interrater reliability (overall kappa of .41) represents the bottom level of quality requirements for an outcome measure. Reliability for outcome measures definitely should be higher for clinical purposes (eg, detecting change during rehabilitation) and for far-reaching decisions about an individual patient's life (eg, post-rehabilitation discharge to the community or to an institutional setting).

In appraising the reliability of the Extended ICF Core Set for Stroke, first of all, one fundamental issue has to be taken into account: an ICF Core Set is not a measure at all, but is a classification-based tool. The key difference between a measure and a classification tool is that a classification can serve as the reference among different measures, different approaches, different perspectives at different facilities, and even different countries, thus being the framework and organizing principle where all information can flow together into the same common system. This means that a classification-based tool can be used to organize qualitative information (eg, "How are you doing today, Mrs Jones?") as well as quantitative information (eg, Life Satisfaction Scale4 scores), results of observer-rated (eg, Functional Independence Measure5) and patient-reported (eg, visual analog scale for pain) outcomes, physical measures (eg, grip strength dynamometer) and questionnaire scores (eg, Hand Function domain of the Stroke Impact Scale), and so on.

In the current study, the data collections using the ICF Core Sets relied on clinical judgments combining all different kinds of available information (observation, interviews, measurements, and records), summarizing them into one qualifier per ICF category. Considering clinical judgments in relationship to reliability, the question that arises is: Does the Extended ICF Core Set for Stroke make the clinical judgments of physical therapists more or less reliable? The results of the current study illustrates that using the ICF Core Set does not bring clinical judgments of experienced and well-trained physical therapists to a level of metric quality, as would be desirable for standardized quantitative measurements. However, the results document where we are now in terms of ICF operationalization and from what point we start on the way from expert clinical judgments, which can have very diverse levels of interrater reliability,68 to quantitative representation of patients’ functioning state by high-quality standardized measures. The results for interrater reliability of the single ICF categories can be taken as indications of (1) areas of functioning in which it seems to be justifiable to rely on clinical judgments and (2) areas of functioning in which efforts to develop elaborate operationalizations are necessary. In addition, if an ICF-based instrument or an ICF manual for the Extended ICF Core Set for Stroke is developed and tested for interrater reliability, then it might be possible to conclude, using the results of the current study, how much more reliable these developments are in contrast to simply mapping the clinical judgment into the ICF's qualifier scale.

In his commentary, Jette refers to measurement as a contrast or alternative to classification. However, it is important to note: To measure or to classify, that is not the question. The quantitative results of assessments are the objects that enter the classification, which is the ordering principle or the structure to store, retrieve, and convey any kind of information about patients’ functioning, disability, and health. In this way, the results of quantitative measurement are the flesh or the muscles attached to the bones of the ICF classification. Therefore, obviously, there is no antagonism between classification and standardized quantitative interval scale measurement. On the contrary, they are complementary approaches, each backing up the other. The ICF Core Set approach can indicate what ICF concepts to measure in a certain health condition9 or in a specific setting,10 and, in turn, the results of different measurements can be organized into the framework of the ICF.

In considering the results of our study, the overall reliability indexes should be interpreted with caution. A conclusive final judgment on interrater reliability should not rely only on kappa or percentage of exact agreement because of certain technical features of these indexes. Nominal kappa especially is not sensitive to other important factors (besides chance variation) that may influence reliability and that may lie outside the features of the rating tool. Thus, kappa analyses can be seen only as a first step in gaining a full picture about interrater reliability in the use of the Extended ICF Core Set for Stroke. Further studies are needed to arrive at a definitive conclusion regarding interrater reliability and, consequently, to arrive at well-founded recommendations about suitable strategies for their improvement.

Additionally, it is to be noted that the usefulness of a tool with a specific reliability depends on the purpose for which it is applied. In principle, an instrument with moderate reliability could be used in large samples (eg, for survey purposes) if no better alternative is available. There is currently no alternative to the use of ICF-based tools and no other standard for describing the full scope and all relevant and specific aspects of disability and functioning in a universal, etiology-neutral, cross-disciplinary way.

Our experience with the data collection for the current study showed that it was difficult to achieve exact agreement in ICF categories that are especially broad. Even the most precise ICF category contains and addresses several different aspects of functioning and disability at the same time. This breadth of the categories, in the sense of openness and nonrestrictiveness, is a major advantage of the classification, as it ensures the ICF's applicability in different contexts and for different purposes. Therefore, changing the ICF—as suggested in the commentary's final paragraph—might not be the most practical and beneficial way to enhance measurement standards in the implementation of the ICF. Instead, establishing accompanying operationalizations and their explicit linking to the categories and qualifiers of the classification seems to be a more appealing solution. A suitable approach has been developed by Cieza et al.11

Thus, we fully agree with Jette's commentary about the challenge of operationalization and the need to meet the methodological standards of objectivity, validity, reliability, and feasibility. Clearly, ICF-based measurement instruments need to be developed. Instruments that are based on the ICF's conceptual framework, such as the Burden of Stroke Scale,12 the Stroke Impact Scale,13 and so on, are already an important step in this direction. However, the next step to be taken is the development of measures based on the classification itself (ie, based on the categories of the ICF).

Without doubt, Jette's clear-sighted argument for the application of item response theory (IRT) can only be supported. Today, IRT is the method of choice for developing high-quality measurement instruments and for facilitating computer adaptive testing, the coming method of choice for assessment applications, as it has been realized for the Activity Measure for Post-Acute Care (AM-PAC)14 cited by Jette.

In line with Jette, it can be concluded that there is a need for measures that are consistent with the ICF as a model and as a classification, as well as for developing algorithms that make the relationship between different measures and the classification explicit to enhance the objectivity of the qualifier scaling and to facilitate the implementation of the ICF in physical therapists’ everyday clinical practice.


    References
 

  1. Jette AM. Invited commentary on "Interrater reliability of the Extended ICF Core Set for stroke applied by physical therapists." Phys Ther. 2008;88:851–853.[Free Full Text]
  2. Starrost K, Geyh S, Trautwein A, et al. Interrater reliability of the Extended ICF Core Set for stroke applied by physical therapists. Phys Ther. 2008;88:841–851.[Abstract/Free Full Text]
  3. International Classification of Functioning, Disability and Health: ICF. Geneva, Switzerland: World Health Organization; 2001.
  4. Fugl-Meyer A, Bränholm IB, Fugl-Meyer K. Happiness and domain-specific life satisfaction in adult northern Swedes. Clin Rehabil. 1991;5:25–33.[Abstract/Free Full Text]
  5. Granger CV, Hamilton BB, Linacre JM, et al. Performance profiles of the Functional Independence Measure. Am J Phys Med Rehabil. 1993;72:84–89.[Web of Science][Medline]
  6. Hicks GE, Fritz JM, Delitto A, Mishock J. Interrater reliability of clinical examination measures for identification of lumbar segmental instability. Arch Phys Med Rehabil. 2003;84:1858–1864.[CrossRef][Web of Science][Medline]
  7. Maher C, Adams R. Reliability of pain and stiffness assessments in clinical manual lumbar spine examination. Phys Ther. 1994;74:801–809; discussion 809–811.[Abstract/Free Full Text]
  8. McClure PW, Rothstein JM, Riddle DL. Intertester reliability of clinical judgments of medial knee ligament integrity. Phys Ther. 1989;69:268–275.[Abstract/Free Full Text]
  9. Cieza A, Ewert T, Ustun TB, et al. Development of ICF Core Sets for patients with chronic conditions. J Rehabil Med. 2004;44(suppl):9–11.[Medline]
  10. Grill E, Ewert T, Chatterji S, et al. ICF Core Sets development for the acute hospital and early post-acute rehabilitation facilities. Disabil Rehabil. 2005;27:361–366.[CrossRef][Web of Science][Medline]
  11. Cieza A, Hilfiker R, Boonen A, et al. Items from patient-oriented instruments can be integrated into interval scales to operationalize categories of the International Classification of Functioning, Disability and Health. J Clin Epidemiol. 2008. In press.
  12. Doyle PJ, McNeil MR, Hula WD, Mikolic JM. The Burden of Stroke Scale (BOSS): validating patient-reported communication difficulty and associated psychological distress in stroke survivors. Aphasiology. 2003;17:291–304.[CrossRef][Web of Science]
  13. Duncan PW, Wallace D, Lai SM, et al. The Stroke Impact Scale Version 2.0: evaluation of reliability, validity, and sensitivity to change. Stroke. 1999;30:2131–2140.[Web of Science][Medline]
  14. Haley SM, Coster WJ, Andres PL, et al. Activity outcome measurement for postacute care. Med Care. 2004;42:I49–I61.[Medline]

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?



This Article
Right arrow Extract Freely available
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Rapid Responses are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Geyh, S.
Right arrow Articles by Cieza, A.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Geyh, S.
Right arrow Articles by Cieza, A.
Related Collections
Right arrow Neurology/Neuromuscular System: Other
Right arrow Tests and Measurements
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2008 by the American Physical Therapy Association.