Background and Purpose. The purposes of this article are to describe the process of developing the physical therapist (PT) and physical therapist assistant (PTA) Clinical Performance Instruments (CPIs) and to present the available information on the psychometric properties of each instrument. Subjects. Two hundred seventeen PTA students and 282 PT students participated in the pilot studies of the CPIs, and 181 PTA students and 319 PT students participated in field studies. Methods. To construct each instrument, content was first gathered from a variety of instruments and American Physical Therapy Association documents related to PT and PTA practice and education. Data compiled during the pilot and field study phases of the project led to the construction of the fourth (final) versions of the CPIs, which although not studied are currently in use. Results. Intraclass correlation coefficients (ICC [2,1]) measuring the interrater reliability of the CPI total score were good (ICC=.87) for the PT total score and moderate (ICC=.77) for the PTA total score. Construct validity was supported by the substantial differences in mean CPI score for students completing first as compared with final clinical experiences, by the correlation between CPI item scores and total days of clinical experience, and by the lack of correlation with the Social Skills Inventory score. Discussion and Conclusion. Sale of the fourth (final) versions of the PT CPI occurred in November 1997 and of the PTA CPI in March 1998. Data based on psychometric evaluation of the final version have not yet been collected and reported. In the task force's opinion, the third drafts can provide reliable and valid measurements of PT or PTA student clinical performance. The fourth versions were based on this iteration.
Physical therapist (PT) and physical therapist assistant (PTA) academic programs establish systems for evaluating students during their clinical experiences. Reasons for evaluation include, but are not limited to, determining whether the student's progress is satisfactory, assessing the student's readiness to enter practice, providing the student with feedback, and obtaining feedback on the education program relative to the currency, relevance, and application of content.1,2 Various systems and instruments have been developed for these purposes. These are theoretically based on the principles of competency-based education.3–5 Personnel at academic programs and clinical education sites have developed instruments, apparently with a goal of evaluating overall competence to practice as well as behavior specific to certain patient populations or clinical sites.5–12
Although many instruments had unique features, shared characteristics led some educators to seek consistency across educational programs (and students). Emergence of consortia, with personnel from multiple educational programs often working with a common group of clinical faculty, played a role in the development of uniform processes and instruments for assessing student clinical performance.6,7,9,13 These trends, the task force believes, set the stage for developing the American Physical Therapy Association (APTA) PT and PTA Clinical Performance Instruments (CPIs). The purposes of this article are to describe the development of 2 CPIs (one for PTs and one for PTAs) and to provide information about the psychometric properties of the third drafts of these instruments (field study versions). This article does not report evidence about the psychometric properties of the final, published versions. No data are reported relating to the versions of the CPIs currently in use.
In November 1993, a 10-person task force was charged by the APTA Board of Directors to develop clinical education evaluation instruments to measure student performance in PT and PTA clinical education.14 Task force members were appointed by the Board of Directors from a group of 75 individuals with a variety of backgrounds, experiences, and expertise who were nominated by personnel from PT and PTA academic programs. The task force began its work by agreeing on a context in which they believe clinical education is provided. This was done in an effort to ensure that the resulting instruments could be used with minimal training, were reflective of current practice expectations, and met the needs of academic institutions to comply with accreditation criteria.1,2
The task force began its work in 1994 by agreeing on 3 foundation assumptions to use as a guide for the development of the CPIs: (1) that clinical competence is based on multiple behaviors deemed essential to the role of the PT or PTA, (2) that the CPIs should be constructed to measure performance along a continuum from novice to at least entry level, and (3) that the instruments must be responsive to the needs of both academic and clinical communities.12,15 The task force believed that the design of the CPIs should allow measurement of behaviors, with multiple practitioners serving as educators. To achieve this goal, the task force attempted to develop instruments that could yield reliable scores as students provided patient/client care in a format appropriate for the environment to which they were assigned. Thus, the instruments would need to be psychometrically sound and provide useful information about student performance during clinical education.16
The work of the task force and the evolution of the CPIs was in 4 phases that culminated in the development of instruments currently in use. These phases were: (1) development of the first drafts, including initial selection of target behaviors and a scoring/instructional protocol, (2) conduct of pilot studies using the second drafts of the PT and PTA CPIs created in response to feedback from a group of 50 people who the task force believed were experts in clinical education and research based on information provided on the nomination forms and in individual résumés, (3) testing of the third drafts to determine reliability, validity, and feasibility via field studies, and (4) modification of the third drafts in response to the field studies and preparation of the final versions for adoption by the APTA Board of Directors and for sale by APTA.
Phase I: First Drafts of the CPIs
The first drafts of the CPIs contained 23 PT and 20 PTA performance criteria and sample behaviors describing observable indicators for each performance criterion. Criteria were developed in an effort to be consistent with documents such as the first draft of A Normative Model for Physical Therapist Education, the Guide to Physical Therapist Practice, Volume I: A Description of Patient Management,17 used by academic programs or clinical sites. A Normative Model for Physical Therapist Education describes practice expectations, educational outcomes, and content for the preferred PT curriculum. The Guide to Physical Therapist Practice describes the breadth and depth of PT practice, including patient or client management. Sample behaviors describing observable indicators for each performance criterion item were then identified. For example, in the first drafts of the CPIs, the performance criterion “Performs physical therapy treatment that achieves desired outcomes” included sample behaviors such as “performs treatment consistent with the plan of care,” “provides treatment in a manner minimizing risk to the patient and others involved in the delivery of the patient's care,” “adapts physical therapy treatment to meet the individual needs and responses of the patient,” and “provides treatment in a manner minimizing risk to self.”
A visual analog scale (VAS) was selected for educators to record the quality of observed behavior for each item on the CPIs. A horizontal line, 100 mm in length, was used to represent the continuum of points between the lowest level and the highest level of student performance that could be observed by a clinical instructor (CI). The line was anchored on the far left with the words “Novice Student Clinician” and on the far right with the words “Expert Clinician.” A mid-range anchor was placed at 60 mm and was labeled with the words “Entry-Level Clinician.” Use of the VAS as a recording format has been suggested to be appropriate when evaluating complex human performance that cannot (and perhaps should not) be divided into the type of discrete units of behavior easily recorded using other formats.18,19 In addition, because continuous scales such as the VAS can reflect degree of change20 better than categorical scales,21 the task force members believed that the VAS met the goals of clinical education assessment more effectively than would other approaches. Because of the large number of possible ratings available with a VAS, it may also decrease problems associated with end aversion bias.21 End aversion bias causes raters to avoid extreme rating categories. Because of end aversion bias, some authors21 argue that a 5-point Likert scale might actually be used as a 3-point scale. There is a belief that loss of response categories tends to decrease both efficiency and reliability.21 Members of the task force also felt that the use of a VAS would address the problem of raters adding plus or minus designations or decimals to categorical scales. This decision was based on the experience of task force members in using other instruments. Finally, the task force members feared that respondents may attach meaning to the numbers on a rating scale that are distinct from the verbal descriptors attached to those numbers.21 The task force felt that this phenomenon was a likely occurrence in an academic setting, where numbers are often associated with grades and with student success or failure.
The first drafts of the CPIs included performance criteria that the task force considered essential, minimum elements for clinical practice. Four performance items related to safety and professional behavior were identified. The task force agreed that a problem with one of these items would be a warning or “red flag” for serious problems with student performance. The first drafts also included a preamble to provide users with a rationale for developing the CPIs, the basic assumptions upon which the instruments were designed, and reasons for considering its use. Directions for use of the CPIs also were included with the drafts.
The first draft versions of the CPIs were reviewed by 50 people within and external to physical therapy who, in the task force's opinion, possessed expertise in academic and clinical education, outcome assessment, evaluation, and psychometric test and measurements. Individuals who had been nominated but not appointed to the initial task force by APTA's Board of Directors formed the group. Feedback was gathered from this group on the structure of the instruments, clarity of directions, relevance and number of performance criteria, consistency between PT and PTA instruments, and mechanism for identifying problematic student performance on any single criterion. The group recommended that the task force clarify the directions in an effort to eliminate what the group perceived as ambiguity. The group suggested that the task force describe how assignment of grades would be made by an academic program. In addition, they suggested that the task force identify the purpose of sample behaviors and clarify that the list of sample behaviors was not exhaustive and provided a mechanism for assessing performance during midterm and final clinical education experience. They also suggested that PT and PTA instruments, where appropriate, be consistent to make it easier for the CI rater who may evaluate PT and PTA students at the same time in their clinical facility, using the same instrument.
Phase II : Second Drafts of the CPIs (Pilot Study Versions)
Modifications From the Previous Versions
The format of the CPIs was modified according to suggestions made by the group of 50 experts. The second drafts, or pilot study versions, of the CPIs included 23 PT and 20 PTA criteria (Tabs. 1 and 2). Two additional items (items 24 and 25 of the PT CPI and items 21 and 22 of the PTA CPI) were added to allow rating of the student's overall performance relative to academic and clinical expectations and entry-level performance. Particularly important performance criteria were labeled as “red-flag” items (Figure), and a symbol of a flag was added to the left of those performance criteria that were identified as such. A “Significant Concerns/At-Risk” check box also was added to allow the CI to indicate at midterm or final evaluation when, in the estimation of the CI, a student's performance placed him or her at risk for failing the clinical experience (Figure). Additional CPI components included tables of contents to make it easier for the user to locate information within the CPI, glossaries to define terminology, and bibliographies to assist in user training that often occurs only through on-site materials reviewed by the CI. Members of the task force also added vignettes that depicted a hypothetical new graduate PT and a new graduate PTA who demonstrated competencies and deficiencies that might be observed in real life.
Information Collected From Physical Therapy Community
In preparation for conducting the pilot studies of the PT and PTA CPIs, feedback was sought from potential users on the second drafts of the PT and PTA CPIs through a survey and the conduct of regional forums. Beginning in October 1995, second drafts of the CPIs were disseminated to 434 physical therapy academic program directors, 454 academic coordinators of clinical education (ACCEs), and their respective clinical education sites and CIs in the United States and Canada. Notification of the opportunity to review these draft instruments was provided through the Education Division newsletter, R.E.A.D. (December 1995), and PT Magazine (October 1995). Approximately 50 people requested copies of the CPIs.
Through the survey, the task force requested feedback on issues such as content, format, and process issues. Examples of issues raised included: how to distinguish performance requirements of the PT and PTA CPIs, whether the instruments were sufficiently comprehensive and user-friendly, how to mark the VAS to indicate student performance levels, how VAS scores can be converted into grades for use by academic programs, and clarity of the directions.
Ten APTA-sponsored regional forums were held between February 1996 and June 1996. These forums were presented by members of the task force for members of academic programs and consortia composed of groups of academic and clinical educators throughout the United States and in Victoria, British Columbia, Canada. During each forum, participants were asked semistructured questions to obtain verbal feedback, and they were surveyed in writing to obtain further comments and opinions. The survey instruments were distributed during the forum and were collected from participants at the completion of the forum. More than 700 people attended these forums. An estimated 350 additional people provided feedback and comments to APTA staff by mail on the written survey that accompanied the evaluation instruments, resulting in feedback from approximately 1,050 people.
Pilot studies were conducted on the second drafts of the PT and PTA CPIs between October 1995 and April 1996 in an effort to provide information on the internal consistency, construct validity and interrater reliability of these versions of the instruments as well as on the factor structure of the PT CPI. The task force considered such analyses necessary to refine the CPIs and to test the use of the CPIs in the clinical setting.
Protection of human subjects.
The University of Miami Medical Subcommittee for the Protection of Human Subjects reviewed the pilot study protocol. Because the researchers did not know the identity of the students or the CIs and the students were required to mail the data to APTA, thereby consenting to participate, this protocol was exempted from obtaining written informed consent from the students and CIs.
In addition to the PT and PTA CPIs, several instruments were used in this study to examine characteristics and satisfaction of participants in the study. The Student Survey requested information about demographics, academic and clinical preparation, and satisfaction with the use of the CPIs. The CI Survey requested information on demographics, clinical setting, and satisfaction with the use of the CPIs. The ACCE Survey requested information on demographics and satisfaction with the use of the CPIs. User satisfaction was examined by having students, CIs, and ACCEs rate their level of satisfaction with various aspects of the CPIs, including time to complete, ease of use, and clarity of instructions. User satisfaction was measured using a 7-point Likert scale. A rating of 1 indicated that the respondent was “very dissatisfied,” and a rating of 7 indicated that the respondent was “very satisfied.”
The sample size identified by members of the task force for conducting the pilot studies was limited to 350 PT students and 350 PTA students. This sample size was set by the task force because the group believed it was a reasonable target sample that could be obtained without knowing in advance how many programs would consent to participate and would have students on clinical education experiences during the time that the study was to be conducted. The sample size also was set in an effort to ensure a sufficient sample size for statistical analyses that would result in data that could be used to make decisions regarding the next draft versions of the CPIs. To obtain this sample, a 2-page questionnaire was mailed to all accredited and developing PT and PTA education programs to determine: (1) the willingness of the programs to participate in a pilot study of the PT or PTA CPI and (2) when students would be completing each of their clinical education experiences during the 1995-1996 academic year. Representatives of PT and PTA programs that could participate in the pilot studies and who had students completing their clinical education experiences between October 1995 and February 1996 were then contacted by telephone and asked to participate. Where multiple programs from the same state were able to participate, the task force considered obtaining a sample that had both public and private institutions, degrees of different levels, and varied levels of clinical education experience (eg, first, intermediate, and final). In addition, the length of clinical education experiences was considered.
Personnel from academic programs that agreed to participate in the pilot study identified, in the aggregate, 350 PT students and 350 PTA potential students willing to participate in the study. The pilot studies actually included 282 PT students from 31 US accredited PT professional education programs (9 baccalaureate and 21 master's degree) representing 26 states and 24 students from 2 PT education programs in Canada (Quebec and Ontario). Two hundred seventeen PTA students from 23 US accredited PTA education programs representing 17 states also participated in the pilot studies. Physical therapist students were first-year, second-year, and, in some cases, third-year students engaged in all levels of clinical education, including first, intermediate, and culminating clinical experiences. Likewise, PTA students were both first- and second-year students engaged in all levels of clinical education. Lengths of clinical experiences for PT and PTA students ranged between 1 and 9 weeks for both part-time (<35 hours/week and ≤1 week) and full-time (≥35 hours/week and >1 week) experiences. A subset of pilot study participants, 70 pairs of PT CIs and 42 pairs of PTA CIs, each pair supervising one student, volunteered to be involved in the interrater reliability phase of the study.
On behalf of the academic programs, the ACCEs agreed in writing to identify possible student and CI subjects and to obtain verbal consent from those subjects to participate in the pilot study. The ACCE at each participating academic institution then obtained verbal agreement from students enrolled in the physical therapy program to participate in the study. When a student agreed to participate, the ACCE contacted the center coordinator of clinical education (CCCE) at the student's assigned clinical education site to determine whether the CCCE was willing to participate. If personnel at the clinical site were willing to participate, the CCCE obtained verbal agreement from the CI assigned to the student. If the CI agreed to participate, APTA's Department of Clinical Education was notified that a study pair (one supervising CI and one student) was available at that clinical education site during a specified time period. For the interrater portion of the study, clinical sites were identified where a second CI was available and willing to evaluate the student's performance without consulting with the PT who supervised the student.
The CIs and students were instructed to first complete the instrument that was typically used by the academic program for midterm and final evaluations to assess the students” performance. Once this was done, the CIs and students also completed the CPI for both the midterm and final evaluations. The students and CIs were instructed to compare their midterm and final evaluations on the CPI. In the interrater reliability portion of the study, both CIs were instructed to independently conduct midterm and final evaluations using the CPI and to not discuss or compare results. Neither the students nor the CIs were instructed to attach numbers to the vertical marks on the VAS. After the final evaluation was completed, the students and the CIs independently completed their respective survey questionnaires and placed both copies of the CPI and their survey questionnaires in stamped envelopes addressed to the researchers. The students then sealed the envelopes and had the prerogative to and responsibility for placing the completed CPIs and survey questionnaires in the mail. The students and CIs were linked as a pair through their identification numbers, but neither the students nor the CIs could be individually identified.
Upon receiving the completed CPIs, staff in APTA's Department of Clinical Education copied the CPIs completed by the CIs and used the code to identify the clinical education sites and academic institutions. Staff then forwarded copies of the completed CPIs from the supervising CIs to the ACCEs at the students” academic institutions. The ACCE Survey was included. Once the ACCEs received copies of the CPIs from all of their academic program's participating students, they were supposed to complete and return the ACCE Survey. Once all CPIs had been copied and forwarded to ACCEs, the master list was destroyed.
For purposes of our study, the marks on the VAS were measured. The zero on the ruler was aligned with the lower anchor of the VAS such that scores ranged from a possible 0 to 100 mm for each scored CPI item. Data entry was performed at the Division of Physical Therapy, University of Miami. All data were managed and analyzed using a VAX mainframe computer and SAS version 6.1* statistical software.22 Analyses of internal consistency, interrater reliability, validity, and user satisfaction were conducted. Internal consistency, in the view of the task force, represents a psychometric characteristic of instruments that are designed to measure one overall construct. In the case of the CPIs, that construct is performance as a PT or PTA. If an instrument is designed to measure one construct, then all items of the instrument should be related to that construct. Internal consistency measures the degree to which items are related.23,24 A total score was generated by taking an average of the item scores.
The purpose of the CPIs is to measure quality of behavior, namely the ability of a student to perform as a PT or PTA. Conceptually, these levels of performance range from a novice level to an entry level. If an instrument is designed to measure various aspects of a single behavior (performance as a PT or PTA), then it is desirable that all the items included in the instrument measure different features of that behavior rather than different parts of dissimilar behaviors.21 Internal consistency of the PT and PTA CPIs was examined by calculating Cronbach alphas.
Reliability can be demonstrated for a measurement only when the measurement is applied to a specific population.21 Interrater reliability was selected by the task force for analysis in this study because the task force was most interested in whether 2 raters could agree about the performance of students at a given time. Interrater reliability of the CPIs was examined by calculating type 2,123 intraclass correlation coefficients (ICCs) for each of the items and for a total score generated by taking an average of the item scores.
Two hypotheses were generated to determine construct validity. The first was based on the belief that students” performance should differ between students on first clinical experiences and final clinical experiences. Evidence for construct validity was examined by calculating a Student t test to compare total CPI scores for students at the end of their first clinical experience with total CPI scores for students at the end of their last clinical experience. In the second hypothesis, the task force also assumed that clinical performance of PT and PTA students should be related to the amount of prior clinical experience. Therefore, construct validity was examined by calculating Pearson correlation coefficients to determine the relationship between the CPI item scores and total days of clinical experience. Criteria for statistical significance were set at .002 to correct for multiple tests.
User satisfaction with the CPIs was measured on a 7-point Likert scale, applied to 17 items, with 1 indicating that the respondent was “very dissatisfied” and 7 indicating that the respondent was “very satisfied.” The median level of satisfaction with all 17 aspects of using the instruments was calculated separately for PT and PTA students, for PT and PTA CIs, and for ACCEs. Median levels of satisfaction were calculated because the calculation of medians rather than means is appropriate for ordinal data.
Principal components analysis.
A principal components analysis was performed to explore the number of distinct constructs represented in the CPIs. Because of the sample sizes, only data from the PT CPI were used for the analysis, and the PTA CPI was not studied in this way.
Results of the Pilot Studies
PT CPI internal consistency.
The Cronbach alpha for the PT CPI was .97, indicating a high level of internal consistency.
PT CPI interrater reliability.
Reliability estimates for the items on the PT CPI ranged from -.02 to .62, with the majority of items demonstrating what the task force would consider moderate reliability (Tab. 1).
PT CPI validity.
Pearson correlations between length of prior clinical experience and several items, including “Designs a physical therapy plan of care that integrates goals, treatments, and discharge plan,” were .49. The lowest correlation between length of prior clinical experience and a CPI item was -.05 for the global rating item “Rate this student's overall performance relative to academic and clinical expectations.” This result was anticipated by the task force because this item essentially asks the rater to adjust for prior clinical experience. This same item failed to demonstrate a difference in score between students on their first and final clinical experiences. All other items demonstrated differences between students on the first and final clinical experience.
PTA CPI internal consistency.
The Cronbach alpha for the PTA CPI was .96, indicating a high level of internal consistency.
PTA CPI interrater reliability.
Reliability estimates for the items on the PTA CPI ranged from .35 to .89 (Tab. 2). This level of reliability was much higher than the task force anticipated and led the group to have some concerns about the independence of the 2 raters.
PTA CPI validity.
As was true for the PT CPI, the correlation was very small for the PTA CPI (.15) between length of prior clinical experience and the global rating item score for the item “Rate this student's overall performance relative to academic and clinical expectation.” The correlation between length of prior clinical experience and a PTA CPI item score was much higher (.39) for items such as “Makes clinical decisions within the scope of PTA practice” and “Participates in modifying the plan of care.” There were differences in average CPI scores between students on first and final clinical experiences for all the PTA CPI items.
Principal components analysis.
The unrotated principal components analysis of the PT CPI produced 2 components, with the first component accounting for 19% of the variance and the second component accounting for only 2% of the variance. All 25 items of the CPI loaded more strongly on the first component. A Varimax rotation was then performed. The first component accounted for 12.1% of the variance, and the second component accounted for 8.7% of the variance, suggesting that the instrument might contain 2 constructs. Component 1 was defined by items 6, 8 through 20, 23, and 25, and component 2 was defined by items 1 through 5, 7, 21, 22, and 24. Items associated with the first construct represented physical therapy-specific clinical skills required for practice such as the physical therapy examination, diagnosis, plan of care, and treatment. The items associated with the second construct were less specific to physical therapy in that they represented behaviors required in all areas of clinical practice and applicable across all patient types. These items included such general clinical behaviors as safe practice, ethical behavior, professional behavior, and cultural considerations (Tab. 3). However, all items showed moderate to strong loading on both factors, with corresponding high communality estimates, suggesting that the PT CPI is probably measuring one underlying construct. The only exception to this finding was for the global rating item “Rate this student's overall performance relative to academic and clinical expectations.” This item had a communality estimate of only .23, probably reflecting the fact that it is the only item that rates overall or global performance relative to academic and clinical expectations rather than a specific performance criterion associated with a component of entry-level practice.
Limitations of the Pilot Study
The students, CIs, and ACCEs who participated in the pilot study were not trained in the use of the CPI or in the study protocol, other than being given written instructions included in the study protocol and the directions included with the CPI. Although these written instructions and directions described the use of the VAS, explained the distinction between criteria and behavioral criteria, and cautioned the evaluator about the risks of rater bias and inconsistency, reliability estimates may have been adversely affected by the lack of training and control. Although training is now available on the CPI, one cannot assume that all CIs using the CPI have received such training. Therefore, in those situations where training is not available or has not been provided, the situation under which the CPIs were tested may mimic the real world. Because of the design of the study, the researchers were unable to determine whether the subjects read, understood, or followed the study protocol or directions to complete the CPI.25 The task force acknowledges that wording and terminology problems could cause some items to be less reliable, but also that large-scale personal training of CI evaluators is not feasible. Data generated from this study were used to compare the relative performance of various items based on our assumptions that any methodological difficulties would affect all items equally. Therefore, items with lower reliability coefficients were examined for possible problems with wording and terminology.
Results of the pilot studies along with input from regional forums and other focus group meetings were used to revise the second versions of the CPIs. In addition, the task force's experience in working with raters, feedback from a community of interest (ie, academic faculty and clinical educators), and analyses from the pilot studies were used to design and conduct the field studies.
Phase III: Third Drafts of the CPIs (Field Study Versions)
Modifications From the Previous Versions
Information from PT and PTA academic faculty and researchers, PT and PTA clinical educators, PT and PTA students; data from the pilot study; and consultation provided by a psychometrician were used by the task force to make changes to the PT and PTA CPIs. In the third drafts of the CPIs, the performance criteria were expanded, with the PT CPI increasing to 24 items and with the 20 PTA performance criteria retained (Tabs. 4 and 5). This draft introduced a modified VAS that eliminated the “Expert Clinician” anchor at the far right as used in the first and second versions. The result was the inclusion of 2 anchors on the VAS; the far left anchor was labeled “Novice Clinical Performance,” and far right anchor was labeled “Entry-Level Performance.” The task force chose to make this modification based on (1) feedback offered by PT and PTA academic faculty and clinical educators about the likelihood that students would (or should) achieve entry-level performance status and (2) advice from the consultant that multiple VAS anchors may complicate or confuse rater behavior. Some physical therapy educators argued that a mechanism was needed to enable acknowledgment of excellence in student performance. Thus, a “With Distinction” box was added to recognize student performance that exceeded entry level on any criterion (Figure).
Wording changes in the performance criteria and instructions were made in an effort to enhance clarity and consistency with APTA documents, including the Evaluative Criteria for the Accreditation of Education Programs for the Preparation of Physical Therapists,1 Evaluative Criteria for the Accreditation of Education Programs for the Preparation of Physical Therapist Assistants,2 third draft version of A Normative Model for Physical Therapist Professional Education (to evolve into A Normative Model of Physical Therapist Professional Education: Version 9726), and preliminary findings from a consensus conference convened to develop A Normative Model for Physical Therapist Assistant Education: Version 98.27 Further modifications included adding new terms and their definitions to the glossary, clarification and expansion of the sample behaviors, clarification of the performance criteria and user instructions, and refinement of the case vignettes. New additions to this version included a box to indicate that the entire performance criterion was “Not Observed” (Figure) and an area for student and evaluator signatures.
Information Collected From the Physical Therapy Community
Availability of third draft versions of the CPIs were announced in the December 1996 issue of PT Magazine; the November 8, 1996, issue of PT Bulletin; the November 1996 Education Division newsletter, R.E.A.D.; and the Fall 1996 Education Section newsletter for review and comment by request. Approximately 80 copies of these instruments were requested for review. In addition, third draft versions of the CPIs were mailed to 493 physical therapy academic program directors, 516 ACCEs, and their respective clinical education sites and clinical educators throughout the United States and Canada. In addition, task force members presented the third draft versions of the CPIs at several national forums, which allowed nearly 500 people to provide feedback. For academic faculty and clinical educators who wanted to discuss the instruments at the local or regional level, the task force developed a questionnaire that was distributed with the instruments. This questionnaire posed closed-ended questions designed to obtain information about respondent demographics, opinions about whether performance criteria reflected entry-level expectations, whether sample behaviors accurately described each performance criterion, usefulness of the VAS in documenting student performance, ability to document excellence in student performance beyond entry-level expectations, ability to identify student problems in clinical performance, and management of student performance when the “Significant Concerns/At-Risk” box is checked or when the performance criterion is “red-flagged” or “non-red-flagged.” In addition, several open-ended questions were posed to: (1) request respondents to identify criteria that need to be revised, added, or deleted, (2) identify major advantages and disadvantages of the instruments, (3) identify questions or concerns, and (4) solicit any comments that would assist in refinement of the instruments. People reviewing the instruments were encouraged to complete and submit the questionnaire for use in development of the fourth (final) versions of the CPIs.
Field studies were conducted between October 1996 and July 1997 to assess the psychometric properties of the third versions of the instruments. The field studies examined: (1) internal consistency, (2) interrater reliability, and (3) construct validity of the PT and PTA CPIs. The University of Miami Medical Subcommittee for the Protection of Human Subjects reviewed the field study protocol. Again, because of the subject anonymity guaranteed by the design, this protocol was exempted from informed consent.
In the field study, as in the pilot study, the research was designed to test the internal consistency, interrater reliability, construct validity, and student, CI, and ACCE satisfaction with the third drafts of the PT and PTA CPIs. The field study also examined the discriminant validity of the CPIs by examining the relationship between CPI item scores and social competence as measured by the Social Skills Inventory (SSI).28 The SSI was administered to a subset of 50 PT students and 50 PTA students to determine whether the CPI was measuring social skills rather than entry-level clinical performance. The researchers believed that although social competence is a desirable characteristic in a clinician, it is not the same as the ability to perform as a PT or PTA. Therefore, the task force expected little or no relationship between SSI score and item scores on the CPI.
Identification of this subset of students to complete the SSI was determined by contacting the academic programs participating in the study to inquire whether their students participating in the field studies were willing to complete the SSI. Additionally, programs were selected if students could be proctored by a faculty member while completing the SSI and agreed to return the completed SSIs to the task force. The SSI was selected to be included in the field studies because it had been shown to be a reliable and valid measure of social competence not restricted to any particular professional situation.28
The SSI28 is a 90-item research instrument that has been designed as a self-report measure of social communication skill in 6 domains that, when combined, comprise global social skill or social competence. Social competence is defined as follows: “… a multidimensional construct that includes skills in receiving, decoding, and understanding social information. It further involves social participation skills such as verbal and emotional expression, regulation of social behavior, and social role playing abilities.”29 The SSI is designed for use with adults at an eighth-grade reading level or above. By use of a 5-point Likert scale, the SSI can be used to measure communication skills on 2 levels—emotional and social skills. Communication skills are measured in 6 domains: emotional expressivity (EE) measures the skill with which individuals communicate nonverbally; emotional sensitivity (ES) measures skills in receiving and interpreting the nonverbal communication of others; emotional control (EC) measures ability to control and regulate emotional and nonverbal displays; social expressivity (SE) assesses skill in verbal expression and the ability to engage others in social discourse; social sensitivity (SS) assesses the ability to interpret the verbal communication of others; and social control (SC) assesses skill in role playing and social self-presentation.
The SSI yields a score that is supposed to reflect overall social competence. The majority of subjects who participated in initial validity studies of the SSI29 were undergraduate and graduate college or university students from multiple disciplines, including the health care professions. Intraclass correlation coefficients for test-retest reliability of the SSI scores ranged from .81 to .96 on a small sample of subjects (N=40) who completed the SSI twice with a 2-week interval between tests to test the stability of scores on attributes that should not vary much within that period of time. The SSI scores demonstrated convergent and discriminant validity in a series of validity studies reported by Riggio.29 Factor analysis, based on a sample of 629 undergraduate students and conducted using a 6-factor solution and the principal axis method with Varimax rotation, revealed that all 6 of the predicted factors emerged28(p8),29 in the 6 domains.
Subjects volunteered and were identified using the same procedures as those used in the pilot studies except that there were no limitations placed on the number of PT and PTA programs, CIs, and students who could participate. Students participating in the field studies had to complete their clinical education experiences between October 1996 and May 1997. Instructions provided to the CIs and students were the same as those provided in the pilot studies. For the interrater portion of the study, 35 groups for the PT students and 35 groups for the PTA students were identified for which a second CI was available and willing to participate. In addition, participating students, CIs, and ACCEs completed surveys on subject characteristics, academic and clinical preparation, and satisfaction with the use of the CPIs.
A subset of 50 PT students and 50 PTA students who participated in the field studies also completed the SSI.28 Once coded, data were to be forwarded to APTA's Department of Clinical Education for subsequent scoring and analysis.
Representatives of 44 PT academic programs and 27 PTA academic programs agreed to participate in the field studies. Of the PT programs in the United States, 54.5% (n=24) were from public institutions and 45.5% (n=20) were from private institutions. Twenty-eight (63.6%) of the PT programs awarded professional master's degrees, and 16 (36.3%) of the PT programs awarded professional baccalaureate degrees. Programs were from all geographic regions of the United States, with PT programs located in 27 states and PTA programs located in 20 states. In the aggregate, physical therapy academic programs were located in 35 different states. In addition, there were 2 physical therapy programs in Quebec and Toronto, Canada (one representing a public institution and one representing a private institution), with both awarding the professional baccalaureate degree.
Three hundred clinical education sites were involved in the PT study, and 225 clinical education sites were involved in the PTA study. They were in rural, suburban, and metropolitan locations. Physical therapist student clinical experiences ranged from part-time (<35 hours per week) clinical experiences of 2 to 5 weeks in length to full-time (≥35 hours per week) clinical experiences of 2 to 12 weeks in length. These experiences included first, intermediate, and culminating clinical experiences. Physical therapist assistant clinical experiences ranged from part-time to full-time clinical experiences of 2 to 10 weeks in length (Tab. 6).
In the aggregate, participating CIs were providing care to patients across the life span, at various levels of disease acuity, and in a broad ranges of venues, including school systems, inpatient care and outpatient facilities, home health care settings, private practices, subacute rehabilitation settings, government-based facilities, industry, and extended care and nursing facilities.
Data from the field studies were analyzed using the same approach as was used in the pilot studies. Data entry was performed at the Division of Physical Therapy, University of Miami. All data were managed and analyzed using a VAX mainframe computer and SAS version 6.1 statistical software.22 As in the pilot studies, all VAS items were measured and assigned a number from 0 to 100 mm. The “With Distinction” box was not used in calculating the numerical score. “Red-flag” items were scored in the same manner as other items. Some experts contend that interrater reliability of individual items is typically lower than the interrater reliability of a composite score.21 Interrater reliability of a composite score was determined by first calculating an average score for all items rated by both raters, excluding the items requiring judgment of overall performance. The reliability between these 2 composite scores was examined using an ICC (2,1). One additional analysis of validity was performed based on the assumption that item scores on the CPI should be unrelated to a student's social skill or social competence. Pearson correlation coefficients were calculated to determine the relationship between CPI item scores and SSI total score. The criteria for statistical significance were set at .0002 to correct for multiple tests.
Reliability coefficients (ICCs) for scores for items on the third draft of the PT CPI ranged from .21 to .76. The lowest reliability coefficient was found for scores for 2 items (25 and 26) measuring overall performance. No ICC could be calculated for item 17 due to a low response rate for that item (Tab. 7). Interrater reliability coefficients for scores of items on the PTA CPI ranged from -.12 to .78. For the PTA CPI, the items with the highest reliability were items 21 and 22, which were designed to measure overall performance (Tab. 8). The ICCs for interrater reliability were .87 for the PT CPI total score and .77 for the PTA CPI total score.
Results of the Field Studies
One hundred eighty-one PTA students and 319 PT students participated in the field studies. Slightly more than 30% of both groups were male. The mean age of the PTA students was 29 years (SD=7.5, range=19–51), and the mean age of the PT students was 26.7 years (SD=5.5, range=20–47) (Tab. 6). The mean age of the PT students” CIs was 33.4 years (SD=7.6, range=23–63), with an average of 6.2 years (SD=5.4, range=0–30) of experience as a CI. The PTA students” CIs were similar to the PT students” CIs, with a mean age of 35.3 years (SD=8.3, range=22–64) and an average of 5.8 years (SD=8.3, range=0–30) of experience as a CI. Approximately one quarter of both groups of CIs were male. On average, the months of prior course work and the length of the clinical experience were longer for PT students than for PTA students (Tab. 6). For both the PT and PTA versions of the CPI, the mean item scores were all above 80, although the range of scores was greater than 70 for all items except for item 1 on the PT CPI and items 5 and 21 on the PTA CPI (Tabs. 4 and 5). The proportion of respondents marking an item “Not Observed” varied widely. Responses ranged from a low of 0% for item 1 (“Practices in a safe manner that minimizes risk to patients, self, and others”) to a high of 63% for item 17 (“Provides consultation to individuals and outside organizations”). The use of “Not Observed” for the PTA CPI was also highly variable. Responses ranged from 63% for item 6 (“Communicated in ways that are congruent with situational needs”) to 55% for item 2 (“Assists the PT in addressing prevention, wellness, and health promotion needs of individuals, groups, and communities”) (Tabs. 4 and Tabs. 4).
The PT version of the CPI had a Cronbach alpha of .97, and the PTA version of the CPI had a Cronbach alpha .96.
Known groups method validity.
Data from 310 PT students and 173 PTA students were used to analyze the relationship between CPI item scores and prior clinical experience. Data from a subset of 68 PT students and 89 PTA students who were on either their first or final clinical experience during the field study were used to perform the known groups analysis. Data from a subset of 31 PT students and 39 PTA students were used to analyze the relationship between CPI item scores and SSI total scores. The actual number of subjects for each analysis varied across CPI items based on the use of the “Not Observed” option. The majority of items on both the PT and PTA versions of the CPI demonstrated differences between students on their initial and final clinical experiences (Tabs. 9 and 10). However, there were some exceptions to this for the PT CPI. Although the means for the initial and final clinical experience groups were different, items 2 (“Presents self in a professional manner”), 17 (“Provides consultation to individuals and outside organizations”), 18 (“Addresses patient needs for services other than physical therapy”), 20 (“Incorporates an understanding of economic factors in the delivery of physical therapy services”), 21 (“Utilizes support personnel according to legal and ethical guidelines”), and 24 (“Addresses prevention, wellness, and health promotion needs of individual, groups, and communities”) and global item 25 (“Rate this student's overall performance relative to academic and clinical expectations”) were not different using the conservative criterion of P=.0002.
Pearson correlations between CPI item scores and total days of clinical experience ranged from .12 (P=.0317) to .40 (P=.0001) for the PT version of the CPI and from .12 (P=.1000) to .34 (P=.0001) for the PTA version of the CPI (Tabs. 9 and 10). Item 25 on the PT CPI and item 21 on the PTA CPI (“Rate this student's overall performance relative to academic and clinical expectations”) were not correlated with total days of clinical experience. Several items on the PTA CPI were not correlated with amount of prior clinical experience (Tab. 10). Items not correlated included items 2 (“Conducts self in responsible manner”), 4 (“Adheres to ethical practice standards”), 8 (“Adapts delivery of physical therapy care to reflect respect for and sensitivity to individual differences”), 16 (“Manages resources to achieve goals of the practice setting”), 19 (“Formulates and implements a self-directed plan for career development”), and 20 (“Assists the PT in addressing prevention, wellness, and health promotion needs of individuals, groups, and communities”) and global item 22 (“Rate this student's overall performance relative to entry level”).
Pearson correlations between the SSI total score and PT CPI item scores were consistently very low and failed to reach statistical significance. None of the items on the PTA version of the CPI were correlated with total SSI score (Tab. 10).
Physical therapist and PTA students, CIs, and ACCEs rated their satisfaction with the use of the third drafts of the CPIs on a 7-point Likert scale. Data, in the task force's view, were ordinal; therefore, medians were reported. Respondents were generally satisfied (5 or 6) with most aspects of using the CPIs, although many items produced essentially neutral (4) median responses (Tab. 11). All users consistently rated the CPIs with a satisfaction score of 5 for ability to distinguish pass/fail, comprehensiveness, and identifying the nature of a student problem and a satisfaction score of 4 for identifying weaknesses in academic curriculum.
Physical therapist CIs appeared to have the highest overall level of satisfaction (6) with the CPIs, especially relative to the clarity of instructions for comments and use of the VAS and the value of using “red-flag” criteria. The ACCEs reported the highest level of satisfaction (6) with the CPIs for the value of using “red-flag” criteria as a basis for giving feedback, identifying a student problem, and usefulness of performance criteria examples. The only item responded to by the ACCEs with a median level of satisfaction less than neutral (4) was the use of the CPIs “as a basis for grading.” This item produced a median satisfaction score of 3, representing mild dissatisfaction. Physical therapist and PTA satisfaction ratings were similar (5 or 4) for all items.
Discussion of Field Study Findings
The purpose of the CPIs is to measure quality of behavior, namely clinical performance, as a PT or PTA. The level of item homogeneity of an instrument was measured using the Cronbach alpha. This statistic ranges from 0 to 1, with 1 indicating perfect item homogeneity.21,23 Both the PT and PTA versions of the CPI demonstrated very high levels of item homogeneity (.97 and .96, respectively). This finding strongly suggests that the items in both versions of the CPI measure various aspects of a single behavior.
Reliability coefficients (ICCs) for individual items ranged from .21 to .76 on the PT CPI and from .00 to .78 on the PTA CPI. The reliability of the total score yielded ICCs ranging from .50 to .75 for the PT and PTA versions of the CPI. These data support the notion that the easiest way to increase the reliability of a score is to increase the number of items contributing to the score.21,23 As long as the correlation between items is not 1.0, increasing the number of items, in our view, will increase an instrument's ability to be used to distinguish among individuals much more than it will increase measurement error, thereby theoretically resulting in improved reliability. Likewise, the task force believes that this explains why a single-item measure such as “Is this student ready to practice as a physical therapist?” is much less reliable than a 25-item index measuring various aspects of readiness to practice. Scores for most items on the third draft of the PT CPI had better reliability than did scores for comparable items on the second draft version of the PT CPI (Tabs. 1 and 7). This would tend to indicate that the changes the task force made to improve clarity of the instrument were warranted. The results for the PTA CPI are much less clear. The interrater reliability for the second draft version of the PTA CPI, in our opinion, was unexpectedly good (ICC [2,1] above .75). By comparison, the interrater reliability for the third draft version was much worse, even for items that had not been altered (Tabs. 2 and 8).
Construct validity reflects the ability of an instrument to measure an abstract concept,21,23 the conceptual (theoretical) basis for using a measurement to make an inferred interpretation.30(p61) In the case of the CPIs, the construct being measured was clinical performance. A number of approaches were used to examine construct validity. The known groups method23 was used to examine construct validity by comparing a group that should have high levels of the construct (students at the end of their final clinical experience) with a group that should have low levels of the construct (students at the end of their first clinical experience). The task force anticipated that these 2 groups would differ on all items on the CPI and would likely differ on some performance criteria more than others. For example, on the one hand, item 2 on the PT CPI (“Presents self in a professional manner”) is an aspect of clinical performance that the task force believed should be mastered early in a student's clinical education.
Thus, the task force expected and found no difference in scores for this criterion between students at the end of the first and final clinical experience for this item. On the other hand, items 10 through 15 on the PT CPI represent aspects of clinical performance that should continue to improve until the end of the final clinical experience. For these items, the task force found differences ranging from 23.4 to 31.9 between the scores of students at the end of the first and final clinical experiences. For, items 17, 18, 20, 21, and 24, there were no differences between students on the first and final clinical experiences. These were the same items that were marked “Not Observed” by a large proportion of the participants in the field studies. Some of these items such as item 24 (“Addresses prevention, wellness, and health promotion needs of individuals, groups, and communities”) represent what the task force considered aspects of PT practice that are currently demonstrated less frequently but are presumed to become more prevalent in the future. Without formal training in the use of the CPI, the CIs in this study may have had some difficulty rating these items relative to entry-level practice.
Concurrent validity demonstrates construct validity by showing that 2 measures that should be related are highly correlated.23 The task force hypothesized that readiness to practice should be related to the amount of clinical experience students had at the time the CPIs were completed. The task force contends that amount of clinical experience was an important factor, but certainly not the only one, explaining readiness to practice. Therefore, the task force expected a moderate correlation between total days of clinical experience and scores on CPI items. In addition, the task force believed that clinical experience would produce a greater change in some behaviors than in others.
The task force hypothesized that items that changed more in response to experience would correlate more strongly to days of clinical experience than would other items. This pattern emerged through the data analysis. For the PT CPI, all items were correlated to days of clinical experience. Items such as items 7 (“Produces documentation to support the delivery of physical therapy services”) and 14 (“Performs physical therapy interventions in a technically competent manner”) were more strongly correlated to days of clinical experience than were items such as items 3 (“Demonstrates professional behavior during interactions with others”) and 5 (“Adheres to legal practice standards”). A similar pattern was seen for the PTA CPI in that items such as items 11 (“Participates in modifying the plan of care”) and 12 (“Performs physical therapy interventions in a technically competent manner”) were correlated with days of clinical experience. In contrast, items such as item 2 (“Conducts self in a responsible manner” or item 4 (“Adheres to legal practice standards”) were not correlated with days of clinical experience.
The task force assumed that although social skills are a desirable trait in a PT or PTA, they are not the same as clinical performance as a PT or PTA. An individual could have excellent social skills but, unless educated as a PT or PTA, he or she would not have demonstrable physical therapy clinical skills. Therefore, the task force believed that if the CPIs measured clinical performance as a PT or PTA rather than social skills, there should be little if any correlation between CPI scores and the total score on the SSI. This was true for the PT CPI.
As is often the case with changes in a measurement tool, there can be some discomfort among users. The task force believed the CPIs would be no exception because they differed in many ways from student clinical performance instruments that were in use. As a result, the task force felt it was important that all groups of respondents (students, CIs, and ACCEs) were satisfied with the CPIs, particularly the clarity of instructions for comments and the use of the VAS. However, both PT and PTA students and PT and PTA CIs indicated on the user-satisfaction survey a neutral (4) median level of satisfaction with the time required to complete the CPIs (Tabs. 11). This response, at least in part, partially reflects the time demand associated with learning to use a new instrument. The task force believes it also reflects time constraints confronting most clinicians involved in providing high-quality student clinical education. Additional training and experience with using the CPIs, in our view, should decrease the amount of time required to complete the instruments and could improve satisfaction.
The only response where median satisfaction was less than neutral was for ACCEs using the CPIs “as a basis for grading.” Because assigning grades for clinical experiences is the responsibility of the academic institution, only the ACCEs actually use the CPIs for this purpose. The third drafts of the CPIs used during the field studies contained no guidelines for how to use the CPIs to generate a score or grade. Grading remains a prerogative of the academic program, which must determine what average, total, or pattern of item scores is required to pass. Given the wide variety of curricula, developing guidelines for grading was clearly beyond the scope of the field studies and the role of the task force. In addition, it would infringe on institutional prerogatives for determining grades.
The process used by members of the task force to develop the instruments was complex and multifaceted. The instruments needed to be psychometrically sound, but they also had to be responsive to the needs of users (ie, physical therapy academic faculty, ACCEs, clinical educators, and students). Two examples are provided to illustrate this decision making, the first where members of the task force based their decision on data and the second where the prevailing decision was based on the needs of users.
In this first example, a decision regarding the inclusion of overall performance ratings in the CPIs was determined based on the results of the field study data. The task force believes student evaluation instruments routinely include an item that asks evaluators to rate the student's overall or global performance. Likewise, the CPIs included a global rating in the first 3 drafts. However, based on the field studies, results indicated that global ratings were not as reliable as those items that measured specific levels of performance. Even though users were accustomed to making a global performance rating and personnel at academic programs were familiar with assessing such a rating, the evidence did not support the continued inclusion of this type of rating. As a result, in the fourth (final) versions of the CPIs, the global ratings were excluded.
A second example illustrates how a decision to revise the CPIs was made in response to the feedback from users. The instruments were developed as outcome measures, and inclusion of a comprehensive list of interventions as a checklist for item 14 of the PT CPI and item 12 of the PTA CPI was not practical. However, users were familiar with and voiced their preference for including a list of interventions (PT) or technical skills (PTA). These are found in other instruments and help to guide evaluations of student competence. As a result, an appendix was added to the PT CPI that listed the categories of interventions as described by the Evaluative Criteria for the Accreditation of Education Programs for the Preparation of Physical Therapists,1 and an appendix was added to the PTA CPI that delineated the entry-level technical skills (performance of all or a component of the skill) as found in A Normative Model of Physical Therapist Assistant Education: Version '98.27 Thus, the resultant changes made to the third drafts of the PT and PTA CPIs to create the fourth (final) versions reflected decisions that were both data based and, in our view, responsive to physical therapy academic faculty, ACCEs, clinical educators, students, and researchers who had provided ongoing feedback and comments in support of the development of the evaluation instruments.
Limitations of the Field Studies
The quasi-experimental design of the field studies created a number of problems, especially for the reliability phase of the field studies. The lack of direct contact between the researchers and the subjects meant that the researchers were unable to determine whether the subjects read, understood, or followed the study protocol or directions to complete the CPI. In some ways, however, this may mimic what could occur if the CPIs were purchased and used without training, because again the researchers would not be able to train users of the instruments. However, the task force recommends, as with any instrument, that the final versions of the CPIs should be used by CIs after receiving training. This lack of control was the direct result of a design that was intended to protect the identity of the participants in an effort to minimize response bias due to social desirability and to eliminate the need for informed consent of students and CIs. Social desirability is the tendency of a research subject to respond in the way he or she thinks the researcher would want him or her to respond.21 The risk of social desirability bias is increased as contact between the subject and the researcher is increased. Lack of contact and anonymity was done in an effort to ensure that the CIs were free to candidly evaluate the student without having to consider the possibility that the student or CI could be identified and be embarrassed by having their scores revealed to members of the task force or APTA staff.
Because there was no direct contact between the researchers and the participants, it was impossible to ensure that the 2 CIs participating in the reliability study had understood and followed the directions for using the CPI, performed their ratings independently, observed the student during the same period of time, or had sufficient observations with which to rate the student.
Likewise, the task force believes a lack of training probably also contributed to what we feel was an overuse of the “Not Observed” category. The task force believes that all items on both versions of the CPI can be rated in almost any setting. The task force contends that lack of understanding of the sample behaviors included under each item may have caused some CIs to use the “Not Observed” category more often than expected or needed. Most items were scored on the VAS in the pilot studies when the “Not Observed” category was not available as an option for completion on the CPI. However, with the introduction of the “Not Observed” category in the third drafts of the CPIs, the majority of CIs marked “Not Observed” for at least one item during the field studies. For example, on the PTA CPI, item 20 “Assists the PT in addressing prevention, wellness, and health promotion needs of individuals, groups, and communities” was marked “Not Observed” by 55% of raters in the field study, whereas only 29% of raters did not score the VAS for this item during the pilot study. For the PT CPI, item 17 “Provides consultation to individuals and outside organizations” was marked “Not Observed” by 63% of raters in the field study, whereas 49% of raters did not score the VAS for this item during the pilot study. As a consequence, for the field studies, it became inappropriate to use principal components analysis to examine construct validity because the data for participants with one or more “Not Observed” items were dropped from the analyses. The remaining sample was too small to perform a principal components analysis on a 24-item instrument.
Another limitation associated with the “Not Observed” responses may have been the use of A Normative Model of Physical Therapist Professional Education27 and a draft version of A Normative Model of Physical Therapist Assistant Education. These are consensus-based documents that describe preferred expectations for entry-level practice. Performance expectations for entry-level practice described in these models include knowledge, skills, and behaviors that are demonstrated frequently in current practice as well as knowledge, skills, and behaviors that are less frequently evidenced in practice but presumed to become more prevalent in the future. Thus, the decision to use these documents reflects a limitation in the study. However, expectations for new graduates entering practice today are also in transition, as reflected by the regular revisions of outcome criteria prescribed by accreditation1,2 and the movement of professional education toward the professional doctoral (DPT) degree, both of which obligate the profession to examine practice to keep pace with the changing expectations of employers, patients or clients, and payers.
The field studies do not represent an examination of the CPIs currently in use (the published forms are the fourth versions). The CPIs were modified in response to the results of the field studies. Examination of psychometric properties of the final versions of the CPIs remains to be done.
Fourth Drafts of the CPIs (Final Versions)
Modifications From the Previous Versions
The fourth (final) versions of the CPIs were developed from July 1997 to September 1997 based on data from the field studies and feedback from physical therapy academic faculty, clinical educators, and students; consultation with the psychometrician; and changes in terminology found in the current versions of APTA documents, including the Guide to Physical Therapist Practice,17 A Normative Model of Physical Therapist Professional Education: Version '97,26 and the first of 3 conferences on developing A Normative Model for Physical Therapist Assistant Education (performance expectations only) (Tabs. 12 and 13).
The final published versions of the instruments included the following: APTA disclaimer regarding the use of the instruments (ie, APTA disclaims any responsibility for controlling the manner in which any clinical education site may, based on the instruments, assess a student's clinical performance or an education institution may use them to determine a grading policy); copyright; table of contents; instructions for completion that included scoring and mechanisms for determining a grade, student, academic program, and clinical site information page; 24 PT and 20 PTA performance criteria; 5 “red-flag” performance criteria (items 1-5); summative comments for “areas of strengths,” “needing improvement,” and other comments; evaluation signatures; glossary; appendix of 3 examples of completed items for final and intermediate experiences; appendix of tests and measures and interventions (PT) and data collection and technical skills (PTA); appendix of performance criteria matched with Commission on Accreditation in Physical Therapy Education evaluative criteria for PT and PTA programs; historical perspective; and bibliography. Each performance criterion included the following components: performance criterion description, “Not Observed” boxes at midterm and final evaluations, VAS with 2 anchors (“Novice Clinical Performance” and “Entry-Level Performance”), “With Distinction” boxes for performance that exceeds expectations for the clinical experience at midterm and final evaluations, sample behaviors related to a level of performance that were modified for those items with high “Not Observed” responses on the field studies to include examples of behaviors that occur in common clinical settings, “Significant Concerns/At-Risk” boxes at midterm and final evaluations, midterm and final comments, and an explanation of qualitative dimensions that the task force believed apply in considering each rating. These 5 qualitative performance dimensions are quality of care, supervision/guidance required, consistency of performance, complexity of tasks/environment, and efficiency of performance (Figure, Appendix).
In November 1997, the APTA Board of Directors approved the PT CPI for use as a voluntary instrument for assessing student clinical performance and made it available on a for-sale basis as a source of APTA non-dues revenue. The APTA Board of Directors approved the PTA CPI in March 1998 also as a for-sale item. Adoption of the PTA CPI was postponed in an effort to ensure that in the instrument there was congruence in language and performance expectations with A Normative Model for Physical Therapist Assistant Education: Version '98,27 which had just been developed following the completion of 3 consensus-based conferences convened in the fall of 1997. Following a review of the consensus-based document, what the task force considered minor editorial changes in language within several performance expectations were made to the fourth version of the PTA CPI, again in an effort to ensure congruence with A Normative Model of Physical Therapist Assistant Education: Version '98.27
The final published versions of the CPIs reflect input from clinical and academic communities, students, and a consultant psychometrician and analysis of the psychometric properties of 2 draft versions of the instruments. The task force believes this provides evidence to suggest that the CPIs in current use yield reliable and valid measurements of student clinical performance. Data, however, are lacking, and examination of the psychometric properties of the final published versions has not been done. This examination, the task force believes, should occur with ACCEs, CIs, and students who have been consistently trained in the use of the CPIs. The task force believes that psychometric properties to be examined should include reliability, validity, component structure, and users” satisfaction with the final published versions. Further studies of the CPIs might include the development of interpretive guidelines for their use as an evaluation leading to an academic grade; compilation of longitudinal data on the CPIs; comparison of the final published versions of the CPIs with computer-based versions of the CPIs (currently in development) for reliability, validity, user satisfaction, time required for completion, and ease of grading. In addition, longitudinal study of changes to frequently responded “Not Observed” performance criteria in relationship to changes occurring in education and practice is needed, and analysis of the use of the qualitative midterm and final comments and their relationship to the VAS score also is needed.
The PT CPI and the PTA CPI were developed using a multifaceted and sequential process. Data indicate that the task force considered acceptable psychometric properties for the first 3 drafts of the CPIs. The CPIs in use (fourth drafts), however, have not been studied, and users should consider this limitation.
Funding from the American Physical Therapy Association supported the work of Task Force for the Development of Student Clinical Performance Instruments.
↵* SAS Institute Inc, PO Box 8000, Cary, NC 27511.
- Physical Therapy