The Cardiac Surgical Register could be bad for your nerves:
A health warning for the worried well
Tom Treasure
The Cardiac Surgical Register could be bad for your nerves:
A health warning for the worried well
The Society's returns for cardiac surgical mortality are likely to be
scrutinised as never before, and yet the provision of 95% confidence
limits around some percentages in this year's Cardiac Surgery Register
may cause inappropriate alarm and despondency. I will first explain what
these confidence limits mean. I will then suggest how a surgeon might
use the present register to check his or her results.
First what are the 95% confidence limits? Consider a very small data
set, such as 20 cases operated upon with two deaths, giving a mortality
of 10%. If you look up the 95% confidence limits for 2/20 they are
from 1% to 32%. To understand what that range is intended to convey,
imagine a sack with a very large number of marbles with an unknown
proportion of black ones. You are allowed only to reach in for twenty
at a time, and you get out two black ones (10%) amongst your sample.
What would be your confidence in saying that the sack contains
exactly 10% black marbles? Put statistically the best you could say
is that there is a 95% chance that the true proportion of black marbles
lies between 1% and 32%. This is simply because 20 cases is a small
sample on which to base an estimate. There is a truth within the sack
but the estimate based on your handful of twenty is unreliable and the
percentage is unstable from one dip to another. The more marbles you
took, the more confident you would be of your estimate, and the
narrower would be the confidence limits. To return to surgery, the
point is that with one's own relatively small experience with uncommon
conditions, variations from year to year, and surgeon to surgeon, is
apparently great without there being necessarily any real difference
from the national average.
Looked at in practice, supposing there is a 10% national mortality,
and 20 representative patients were operated upon by a number of equally
skilled surgeons, with anywhere between 1 and 5 deaths (percentages 5% to 25%).
In each instance the confidence limits include 10%.
But if your sample of 20 had either no deaths at all, or seven or more,
such a discrepancy would occur by chance only once in 20 times.
That is to say its probability (P) is less than 0.05.
Therefore if the confidence limits around your own sample overlap the
national percentage, this is a simple check that your performance is
not significantly different from the group. You are, if you like,
checking whether you are likely to be sampling from the same sack as
every one else, within which it is the rules of chance which determine
who gets more and who gets fewer black marbles this year.
But what is the point of putting confidence limits around selected,
large national totals? None in my view. The bigger the sample the
more sure the statistician can be of the "true" proportion, to the point
that when you have 22,160 coronary bypass operations, the 95% confidence
limits on the front page are narrow at 2.8% to 3.3%, spanning only half
of one percent from top to bottom. All that says is that it is probable
that if the another similar 22,160 patients had operations, it would come
out very close to 3% again. This is not a useful range to use for self
inspection. If we compare ourselves with the lower limit of the confidence
limit of the proportion of such a massive data set, it might well exclude
40% of all surgeons, completely inappropriately. It is the range and
distribution of individual percentages that matters, and whether a
surgeon is in the central bunch of the distribution range, or is a
straggler. Remember, half of us will be "below average" but can the
public handle that concept? Meanwhile, the question is whether you
are an outlier, worryingly under performing.
So how else might we look at the problem? Suppose 100 surgeons pool
their figures of 200 coronary operations each to provide a denominator
of 20,000. With their pooled total of 600 deaths, the mortality is 3%.
The figures for individuals will range perhaps from a spectacularly good
(and lucky) 0% to a worrying 12% with the majority clustering between
2% and 5%. The distribution of these 100 individual performances will
almost certainly be skewed because no one can lose fewer than 0%
(so the distribution is curtailed at one end) with a tail out to the
right, of those who have had a tough year. The 95th centile of the
distribution is easily defined. It is simply the percentage mortality
of the 95th ranking surgeon, thus identifying below that the five
individuals with the highest mortality for that year.
But in its simplest form, this is merely another version of a league
table and has the problem that of 100 brilliant surgeons, there must be
five occupying the lowest ranks in any given year, it might be a
different five every year, and they might be only a fraction behind
the others. Furthermore, it goes without saying, that without
adjustment for case mix, it would be quite misleading and counter
productive.
There are more sophisticated ways of studying the distribution and
defining outliers which we are working on, but the use of 95% confidence
limits of the proportion of deaths on the front of the Register,
in my view, needed a "health warning" for "the worried well".