Benchmarking in orthopedic surgery does not recognize diverse groups

Subspecialties and small groups present some thorny problems for across-the-board comparisons.

Issue: November 2006

ByDouglas W. Jackson, MD

Douglas W. Jackson, MD [photo]
Douglas W. Jackson

There are many ways to compare orthopedic surgeons and the work we do. While others are making different comparisons of our practices and outcomes, we need to understand what these comparisons are and how they are used.

In this month’s interview, I asked Dr. James Hamilton to share with us how benchmark comparisons are used in orthopedics and what are some of the considerations in establishing meaningful data. Dr. Hamilton is the Rex L. Diveley Professor and Chair of the Department of Orthopaedic Surgery at University of Missouri-Kansas City.

Douglas W. Jackson, MD: What are “benchmarks” and why are they developed?

James Hamilton, MD: A “benchmark” is another name for the average, or mean, of similar data supplied by multiple sources. It is usually applied to production or quality control data and is used as a guide to what should be expected as a reasonable target for similar situations. For instance, if 10 lightbulb factories produced lightbulbs at a rate of 25, 28, 38, 40, 42, 52, 55, 67, 75 and 78 lightbulbs an hour, the average production rate would be 50 lightbulbs per hour. This then would be the “benchmark” that a manager might use to assess if his factory is operating at least on average with other light-bulb factories.

Benchmarks are used to compare your situation to the average of a larger group. The data might indicate lower productivity, which would then lead to an analysis of a cause for the variation. It is assumed that correction of that cause would then permit your situation to then approach that of the facilities similar to yours.

Jackson: What are some of the problems with the methodology used to establish benchmarks?

Hamilton: For benchmarks to be useful, they have to be able to compare a large number of groups. You can’t develop a meaningful benchmark using only two data sets. Large variations in a small number of data sets results in significant skewing of the average. The larger the number of data sets compared, the more reliable the result. Usually 20-25 data sets are needed to get good reliable results for a meaningful benchmark.

Additionally, the data must be from similar situations. In the lightbulb factory example, analysis might show that the factories producing mercury lights were the ones producing at 25 and 28 bulbs an hour; halogen light bulb factories were the ones producing at 38, 40, 42 bulbs an hour; regular incandescent light bulb factories produced at 52 and 55 light bulbs an hour; and fluorescent light bulb factories produced at 75 and 78 light bulbs an hour.

This would suggest that separate benchmarks would need to be established for each type of light bulb factory (ie, mercury – 26.5; halogen – 40; incandescent – 53.5; fluorescent -76.5). Use of the benchmark of 50 light bulbs an hour from the total group might make the mercury factories appear to be poor producers and the fluorescent factories to be outstanding producers, when in fact, all of them were essentially average when compared to factories more closely resembling each other.

Jackson: What is the current situation for benchmarks in orthopedic surgery?

Hamilton: While several sources have compiled benchmarks for orthopedic surgery in the past, the data was compiled from all types of subspecialists.

James Hamilton, MD [photo]
James Hamilton

The resulting benchmarks were misleading – for instance, orthopedic oncologists do not produce the same number of relative value units (RVUs) as a total joint surgeon – but they were combined in the same benchmark for “orthopedic surgery.”

Also, in the past, data was not available on 20 orthopedic oncologists to develop a reliable separate benchmark for that group. Last year, the AAOS entered into an agreement with University Health Consortium – Faculty Practice Solutions Center (UHC-FPSC). UHC-FPSC annually collects data from 71 academic institutions with over 1,200 academic orthopedic surgeons. Working with the AAOS, UHC-FPSC was able to identify the subspecialty areas of this group and develop benchmarks for 10 orthopaedic subspecialties (general, joint, spine, pediatrics, oncology, hand, shoulder/elbow, foot/ankle, trauma and sports medicine) using 2004 data.

These benchmarks demonstrated a significant difference in average RVU production by subspecialty.

A large number of data sets were available for each subspecialty to develop reliable benchmarks for academic surgeons. Because this data was developed using the academic model, it should not be assumed to represent the private practice sector just as mercury light bulb factories should not be compared to fluorescent light bulb factories. However, a similar variance of productivity would be expected in the private practice sector.

A system has not yet been identified to gather the required data to establish reliable subspecialty benchmarks for the private practice sector. That would be the next logical step.

Jackson: How could benchmarks be used to improve an orthopedic practice?

“For benchmarks to be useful, they have to be able to compare a large number of groups. You can’t develop a meaningful benchmark using only two data sets. ”
— James Hamilton, MD

Hamilton: Benchmarks are a tool for the practice manager to use. If they are to be beneficial, the practice first has to develop a tracking system to collect the data that the benchmark measures. It is critical that the manager fully understand exactly what the benchmark measures.

For instance, in the academic benchmarks I discussed, there is no credit given for research or teaching – both of which are significant aspects of an academic practice and factors which would alter the RVU production that the benchmarks measure.

The simple act of establishing a reliable data collection system improves the practice.

Having collected the data, the manager would then need to compare the data and ask “How do I expect the data from my practice to compare to the benchmarks?” For instance, I would not expect a sports medicine doctor working in a city/county hospital having one OR available per day to produce at the same level as a sports medicine doctor practicing in a sports medicine institute having four ORs to use each day.

If the level did not match the expectation, than perhaps an analysis of why it did not match would be done. Such an analysis might reveal a billing or coding problem in the processing of the charges.

Essentially, benchmarks should be used as a way to determine a reasonable level of expectations based on analysis of data from other institutions that are similar. They should not be regarded as an absolute target.

Situations could easily shift reasonable goals to be higher or lower – but in the work of comparing an office to the benchmarks, an analysis is done which can uncover previously unrecognized problems and thus provide a great opportunity for improvement.

For more information
See the American Academy of Orthopaedic Surgeons’ Web site, www.aaos.org