Data extraction in meta-analysis
Extracting data for meta-analysis can be very frustrating because authors often don’t report the summary data that you want, that is, the same statistics and the right statistics for the meta-analysis software e.g. mean and standard deviation. There are some great resources for data extraction to help you convert data from what’s reported into what you want, but perhaps randomised trials are better served (for example, by the excellent Cochrane Handbook) than other study designs. There are other resources but they’re scattered around and are sometimes not accessible to all those who may want to carry out meta-analysis, as some methods involve complicated equations.
The aim of this resource is to provide a series of useful tips on data extraction, to shed light on, and raise awareness of the different methods and equations that are available to convert data into what you need for meta-analysis. I’ll also try to demystify the maths by giving worked examples and only offering the derivation of the equations as an optional extra.
The data that you will be looking to extract, to input into meta-analysis software, will depend on the outcome and these data will typically be:
Outcome | Data required |
Diagnostic accuracy | Numbers of true positives, true negatives, false positives and false negatives of the diagnostic test |
Dichotomous | Number and number with the outcome in each group OR ln(x) and its standard error, where ln is the natural log and x is odds ratio, relative risk or hazard ratio |
Continuous | Number, mean and standard deviation of the outcome for each group |
This resource is a directory of tips arranged into four groups. Many of these tips have been summarised in a 4-part series of articles about extracting diagnostic accuracy data, extracting categorical risk data, combining and converting groups (relating to continuous data) and summarising good practice guidelines for meta-analysis.
About the author:
Dr Kathy Taylor is a medical statistician in Oxford University's Nuffield Department of Primary Care Health Sciences. Kathy teaches data extraction in Meta-analysis. This is a short course that is also available as part of our MSc in Evidence-Based Health Care, MSc in EBHC Medical Statistics, and MSc in EBHC Systematic Reviews.
Follow updates on this blog, related news, and to find out about other examples of statistics being made more broadly accessible on Twitter: @dataextips
Extracting dichotomous outcomes
D1 – Prognostic studies report measures of risk on different scales
D2 – A beta coefficient is reported
D3 – Calculating a standard error of a beta coefficient
D4 – Pooling categorical risk data
D5 – A worked example of summarising categorical risk data
D6 – Changing the reference category in categorical risk data
D7 – Missing categorical risk data
D8 – Relative risk data are for different definitions of high vs low
D9 – Estimating a hazard ratio from time-to-event data
D10 – Estimating from a Kaplan curve and follow-up information
Extracting continuous outcomes
C1 – Missing a mean, standard deviation or sample size
C2 – Data are reported for the wrong time point
C3 – Data are reported for the wrong group
C4 – Obtaining summary data using complementary equations
C5 – Missing a mean and only another average is reported
C6 – Missing a SD and another measure of dispersion is reported
C7 – Neither the desired statistic, nor a similar statistic is reported
C8 – Only an effect estimate is reported
C9 – Other approaches to dealing with diverse continuous data