Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

‘When did we start thinking that a single study was enough to prove a scientific hypothesis? Richard Stevens, Director of the MSc in EBHC Medical Statistics

There are many things I learnt as a statistics student that I don’t teach as a medical statistics lecturer.  One example is the “Central Limit Theorem”.  EBHC students don’t need to know the formal conditions and mathematical proof I learnt as a maths student.  But it does help to know how to communicate the result in plain English: when your study has a large sample size, you don’t need a perfect normal distribution to get valid confidence intervals and p-values.

However, one thing we do emphasize is the correct interpretation of p-values.  Not all students need to know how each p-value is calculated, in an age when computers do the number crunching.   But they do need to understand that the p-value for the result of a trial is the chance of getting a result “like this” (example: a difference at least this big between drug and placebo), if the null hypothesis (example: drug has no effect) is true.  If the p-value is small (less than 5%) we have found what is referred to as a “statistically significant” effect.

We also insist that our students understand this distinction: the p-value is the probability of getting a result like this, assuming the null hypothesis is true; it is not the same thing as the probability our hypothesis is true, given that we’ve seen a result like this.  When students in the medical school look at me as if this is a meaningless distinction, I say: did those two statements sound similar?  They are as different as telling you that half of all Welsh people are women, or telling you that half of all women are Welsh people.  Students are usually willing to accept that these two sentences are very different, even if they sound closely related. But I wonder if they understand why we statisticians place such an emphasis on the difference?

The recent buzz in scientific journals about an alleged “replication crisis” shows how widespread this misunderstanding is.  The medical publishing world seems to be very surprised that studies that achieve “statistical significance” (defining significance with a 5% threshold and using 95% confidence intervals) can’t be replicated much of the time.  Did we think that because we use a 5% threshold for statistical significance, and 95% confidence intervals, that means that 95% of positive findings studies should be successfully replicated?  Or, to put it another way – did we think that because half of the Welsh people are women, it follows that half of the women will turn out to be Welsh?

I have never seen this explained better than by Professor Alexander Bird of King’s College London. In a talk in Oxford for our EBHC seminar series, he explained with beautiful clarity that we should not expect 95% of studies to replicate, just because we are using 95% confidence intervals and a threshold of 5% for our p-values.  He also demonstrated that this is not an issue of statistical power, a measure of whether the sample size is ‘big enough’.  Alexander’s talk describes in part what David Colquhoun termed the false discovery rate, that is the chance a study is wrong when it states a ‘statistically significant’ discovery.

Alexander concluded with three possible approaches to tackling the “replication crisis”.  My preferred solution is to stop thinking of it as a crisis!  When did we start thinking that a single study was enough to prove a scientific hypothesis?  When did we forget the importance of confirmatory studies?  Confirmatory studies are a fundamental of science, and for the very reason, that replication of the original result is far from guaranteed.

I’m delighted that when Alexander spoke to our staff and students in April, he gave us permission to record his presentation for our audio podcast series.  Firstly, this talk is essential listening for anyone who takes the “replication crisis” seriously.  Secondly, it’s a reminder that good statistical understanding is essential for science.  Finally, this elegant explanation (from a philosopher, not a mathematician) is a perfect demonstration that good statistical understanding is not the exclusive territory of professional statisticians.


1. Prof Alexander Bird speaking to the Evidence-Based Health Care programme in Oxford, April 2018.