Outcome Assessment Following Total Hip Replacement
Abstract
Outcome assessment after hip arthroplasty is relevant both to in-depth research of specific procedures and to monitoring standards of practice. Instruments of outcome assessment should be fast, easy to use, reliable, specific to the question being asked, cost-effective, and applicable. Increasing evidence exists that patient-based outcome measures are more reliable than those based on clinicians’ scores. This article reviews the types of instruments that are available and offers guidance about the outcome measures that are most appropriate for orthopedic surgeons.
Outcome is defined by the Oxford English Dictionary as a “visible or practical result.” Measurement of these visible results is vital to the assessment and setting of standards of care. Fergusson and Howarth1 in 1931 described one of the earliest assessments of hip mobility that allocated more points for flexion and abduction than for adduction and hyperextension in patients with slipped upper femoral epiphysis.
This formed the basis of the hip scoring system that assessed pain, range of movement, and walking ability described by Merle d’Aubigne and Postel2 in 1954 and subsequently modified by Charnley3 in 1972. The Harris Hip Score4 was developed in 1969 and assessed pain, function, range of motion, and absence of deformity. Problems with the variation and reproducibility of available instruments were identified by Andersson5 in 1972. He noted that the nine systems used to evaluate one group of patients fell into five statistically significant groups.
Early attempts at outcome assessment monitored the incidence and severity of complications. These remain an important consideration and will influence a patient’s level of satisfaction. Survivorship analysis has been proven a powerful tool in the long-term assessment of arthroplasty, but using revision as the endpoint remains a poor discriminator of outcome in terms of patient satisfaction.
Outcome measures can broadly be divided into those that measure a surgeon’s assessment, which is often vulnerable to bias, and those that measure a patient’s assessment of outcome and satisfaction. It is appropriate to measure both types of outcome and, as they are likely to assess different factors, they should normally be presented as different outcomes.
Requirements of a Good Outcome Instrument
The two essential requirements of any outcome instrument are that it measures what it is designed to measure and that the measurement is made with minimum error.
Validity
Validity assesses whether the instrument measures what it is intended to and, hence, whether it is valid for a particular application in a specific population.
Reliability
Reliability assesses both the test-retest and the interobserver reproducibility, as well as the consistency of a scale. A reliable instrument ensures that measurements are made with minimum error. It is possible for a scale to be reliable without being valid.
Responsiveness
Responsiveness reflects an instrument’s sensitivity to change and its ability to detect changes when they are present.
A scoring system should be validated as an outcome instrument for a specific setting. It is important that it is applied to a similar population under the same circumstances (ie, a hip score may be valid for an elective total hip replacement [THR] but invalid for a THR performed for a femoral neck fracture). If a scoring system has been validated for a chosen outcome measure, then surgeons should use it, as inventing a new system will diminish the value of the results and make them incomparable.
Surgeon-Based Assessment
It has been shown in a number of studies in other fields of medicine and surgery that physicians and patients significantly disagree about health status.
Hip scores may include a number of factors, which may be subjective (eg, pain), objective (eg, range of movement), or radio-logical. The scores are then often added together to give a final composite score.
The Charnley modification of the Merle d’Aubigne and Postel score and the Harris Hip Score are two of the most widely used hip scores. These scores combine elements of assessment by both the surgeon and the patient. The Charnley modification scores pain, hip movements, and walking on a scale of 0-6 and does not combine the scores to obtain a total score. The Harris Hip Score is one of the most widely used methods of assessing outcome following THR. It awards 0-44 points for pain, 0-47 points for function, 0-5 points for range of motion, and 0-4 points for absence of deformity. Assessment of the functional component is based on the presence of a limp, the use of walking aids, and specified activities.
In 1990, Johnston et al6 suggested a comprehensive system of reporting results that included pain, function, patient satisfaction, physical examination, and radiological evaluation. This system was intended to unify many of the existing systems and has subsequently been validated and adopted by the American Academy of Orthopaedic Surgeons and by the Société Internationale de Chirurgie Orthopédique et de Traumatologie.
Different hip scores produce different results that are not comparable.7 In addition, authors may obtain different results using the same scores8 and hip scores cannot be used equally for all age groups.
An inherent problem with most scoring systems for the assessment of outcomes is that they are composite scores. They often include both clinical and radiological data together with subjective patient-based and clinician-based data. There is no reason to believe that scores allocated within a criterion are proportional. Therefore, the scores cannot be added together in a meaningful way.
Survivorship Analysis
Survivorship analysis has been a powerful tool in the long-term assessment of replacement arthroplasty and allows comparison among types or series of joint replacements. Survivorship analysis was first used in orthopedics by Dobbs in 1980.9 The methods for calculation of survival analyze the length of time to an endpoint (death, revision of implant, etc) (Figure 1).
The Kaplan-Meier10 method is most commonly used to estimate prosthesis survival and construct survival plots. It provides results that are independent of time intervals, as the survival is estimated at every failure time. Statistically significant differences can be assessed by using the log rank test. However, the log rank test does not allow adjustment for confounding factors. Relative risks for revision can be assessed and adjustments made for differences between compared groups (eg, age, gender, diagnosis, and other confounding factors) by using the Cox multiple regression model.
|
|
Revision is a definite and easily reproducible endpoint, but can be influenced by extraneous factors such as a patient’s fitness for surgery and the severity of pain. Other endpoints, such as the presence of severe pain, low functional scores, and radiographic failure, should also be included.
A 95% confidence interval should be given when presenting survival results. These can be presented in tables or on curves. Murray et al11 recommended the inclusion of a “worst case” curve — in which all patients lost to follow-up are considered failures — to provide a statistically accurate statement of survival (Figure 2). In addition, Lettin et al12 have recommended that at least 40 surviving subjects are required to produce reliable results.
Patient-Based Assessment
Patient-based assessment can be divided into measures relating to patient satisfaction and those pertaining to health-related quality of life. Measures of health-related quality of life may be generic (which measure general health status) and disease specific.
Generic Scales
Generic instruments have multiple dimensions that measure general health status, rather than quality of life, and allow comparison across many conditions and interventions. The SF-36 is the most widely used generic instrument and measures eight dimensions of health status: physical functioning, role limitation due to physical problems, role limitation due to emotional problems, social functioning, mental health, energy/vitality, pain, and general perceptions of health. The SF-36 has been successfully reduced and validated to a 12-item questionnaire (SF-12).13,14 The use of the SF-12 is recommended because it significantly reduces respondent burden. However, its use requires a license, and it is not easy to score (scoring is based on a computer algorithm).
Disease-Specific Scales
Disease specific scales are designed to be sensitive to specific diseases in terms that relate to that disease. Respondents free of the disease are often not capable of completing these instruments, which makes it impossible to provide normative and comparative data.
The Western Ontario and McMasters University Osteoarthritis Index15,16 is a self-assessed, disease-specific measure for patients with osteoarthritis of the hip or knee. It measures 24 clinically important, patient relevant items in three dimensions: pain (five items), stiffness (two items), and physical function (17 items). A reduced function scale using only seven items has recently been shown to be valid, reliable, and responsive to change.17
The Western Ontario and McMasters University Osteoarthritis Index measures outcomes important to patients rather than those determined by surgeons, and has become a standard tool for use in clinical trials in hip and knee osteoarthritis.
The Health Assessment Questionnaire, also known as the Stanford Health Questionnaire,18 was originally designed for patients with rheumatoid arthritis with questions regarding 20 tasks in eight functional categories. This questionnaire has been translated into many languages.
Other disease-specific scales include the Arthritis Impact Measurement Scale19 and the McMaster Health Index Questionnaire.20 These scales assess physical, social, and emotional well being. Alterations have been made to many of these instruments, making global comparisons difficult as many studies use different versions where the differences are often unclear.
Site-Specific Instruments
The Oxford Hip Score21 is a 12-item questionnaire for patients undergoing THR. It was developed from patient interviews and validated against the SF-36 and the Health Assessment Questionnaire. Although there is a lack of literature in which the Oxford Hip Score has been used as a measurement of outcome and thus its use as a primary endpoint is potentially limited, it is a short, practical, valid, and reliable questionnaire that is sensitive to clinically important changes. It is also easily scored in a clinical setting.
Discussion
Orthopedic surgeons are increasingly being asked to evaluate the outcomes of their practice. Increasing patient awareness and expectations, evidence-based health care, and fiscal considerations are likely to mandate the use of outcome measures in the future.
The introduction and evaluation of a new implant requires clinical evaluation by the surgeon-developer, use of a surrogate outcome measure (eg, assessment of micromotion with radiostereometric analysis), and a multicenter trial. Ideally, the trial should be a randomized controlled trial and the new treatment should be compared with a “validated” alternative. Such a trial is generally agreed to be one of the most powerful methods of establishing optimal management.
In this type of in-depth evaluation, outcome instruments should include a generic health status (eg, SF-12) and a disease-specific instrument (eg, the Western Ontario and McMasters Universities Osteoarthritis Index), a site-specific, patient-based assessment (eg, the Oxford Hip Score), and a hip scoring system (eg, Harris Hip Score). A radiological assessment should also be included. These measures would be inappropriate for ongoing surveillance of an average orthopedic surgeon’s practice.
An outcome measure should have proven validity and reliability, should be inexpensive, and should be easy to administer. Also, it should be user-friendly, not time-consuming, and easily understood by the clinician and the patient. Ideally, scores should be acquired directly without the need for calculation.
Patient-perceived health status and health-related quality of life are now generally accepted as the most important outcomes — with the exception of mortality — of surgical intervention. It has been shown that although patients experience significant improvement in pain and function after THR, the changes in the psychosocial function are much smaller. The principal advantage of generic scores is that they provide an idea of the impact that any deficit in the locomotor system has on a patient. Generic scores also allow comparison of disabilities produced by impairments in other body systems.
However, the weakness of a generic instrument is that it is less sensitive to subtle changes in regional function and therefore provides a poor discriminator in studies of musculoskeletal outcome. It would seem sensible to use a generic score and a disease- and site-specific score: the former to allow a comparative measure of holistic impact, the latter to achieve musculoskeletal sensitivity.
There is little evidence that the longer and more detailed instruments are more sensitive to important changes, whereas their length usually makes them less acceptable to patients. Levels of compliance with completion of questionnaires have suggested that patients find shorter and simpler questions about satisfaction more acceptable. Simple questions about satisfaction with outcome may therefore provide an ideal compromise between the more cumbersome details included in health status measures and the need for high response rates to avoid bias in evaluation of outcomes. This type of survey should prove adequate for monitoring standards of practice.
References
- Fergusson AB, Howarth MB. Slipping of the upper femoral epiphysis. JAMA. 1931; 97:1867-1872.
- Merle d’Aubigne R, Postel M. Functional results of hip arthroplasty with acrylic prosthesis. J Bone Joint Surg Am. 1954; 36:451-475.
- Charnley J. Long term results of low friction arthroplasty of the hip performed as a primary intervention. J Bone Joint Surg Br. 1972; 54:61-76.
- Harris WH. Traumatic arthritis of the hip after dislocation in acetabular fractures treatment by mold arthroplasty. J Bone Joint Surg Am. 1969; 51:737-755.
- Andersson SG. Hip assessment: a comparison of nine different methods. J Bone Joint Surg Br. 1972; 54:621-625.
- Johnston RC, Fitzgerald RH, Harris WH, et al. Clinical and radiographic evaluation of total hip replacements. J Bone Joint Surg Am. 1990; 72:161-168.
- Callaghan JJ, Dysart SH, Savory CF, Hopkinson WI. Assessing the results of hip replacement. A comparison of five different rating systems. J Bone Joint Surg Br. 1990; 72:1008-1009.
- Thomas DI, Bannister GC. Exchange arthroplasty best for infected total hip replacement. Hip International. 1991; 1:17-20.
- Dobbs HS. Survivorship of total hip replacements. J Bone Joint Surg Br. 1980; 62:168-173.
- Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association. 1958; 53:457-481.
- Murray DW, Carr AJ, Bulstrode C. Survival analysis of joint replacements. J Bone Joint Surg Br. 1993; 75:697-704.
- Lettin AWF, Ware HS, Morris RW. Survivorship analysis and confidence intervals. An assessment with reference to the Stanmore total knee replacement. J Bone Joint Surg Br. 1991; 73:729-31
- Ware JE, Kosinski M, Keller SD. A SF-12: an even shorter health survey. Med Outcomes Trust Bull. 1996; 4:2.
- Jenkinson C, Layte R. Development and testing of the UK SF-12. J Health Serv Res Policy. 1997; 2:14-18.
- Bellamy N, Buchanan WW, Goldsmith CH, Campbell J, Stitt L. Validation study of the WOMAC: a health status instrument for measuring clinically-important patient-relevant outcomes following total hip or knee arthroplasty in osteoarthritis. J Orthopaedic Rheumatology. 1988; 1:95-108.
- Bellamy N. WOMAC Osteoarthritis Index. A user’s Guide. London, Ontario, Canada: University of Western Ontario; 2000.
- Whitehouse SL, Lingard EA, Katz JN, Learmonth ID. Development and testing of a reduced WOMAC function scale. J Bone Joint Surg Br. 2003; 85:706-711.
- Fries JF, Spitz PW, Young DY. The dimensions of health outcomes: the health assessment questionnaire, disability and pain scales. J Rheumatol. 1982; 9:789-793.
- Meenan RF, Gertman PM, Mason JH. Measuring health status in arthritis: the arthritis impact measurement scales. Arthritis Rheum. 1980; 23:146-152.
- Chambers LW, MacDonald LA, Tugwell P, et al. The McMaster Health Index Questionnaire as a measure of quality of life for patients with rheumatoid arthritis. J Rheumatol. 1982; 9:780-784.
- Dawson J, Fitzpatrick R, Carr A, Murray D. Oxford Hip Score: questionnaire on the perceptions of patients about total hip replacement. J Bone Joint Surg Br. 1996; 78:185-190.
Authors
From the University of Bristol, Department of Orthopaedic Surgery, Bristol Royal Infirmary, Bristol, United Kingdom.