Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

In this blog Senior Statistician and Epidemiologist Dr Margaret Smith, co-lead on our new accredited short course Medical Statistics for Big Data, describes a recent study on maternal anaemia and congenital heart disease, done using the CPRD GOLD database of electronic health records. Students on our new course will acquire skills enabling them to analyse similar studies using electronic health record databases, and much more besides.

About the author: 

Dr Margaret Smith is a Senior Statistician and Epidemiologist and has been with the Nuffield Department of Primary Care Health Sciences since October 2014. She is part of the CPRD (Clinical Practice Research Datalink) team and is joint course lead on the Centre for Evidence-Based Medicine (CEBM)'s new accredited short course Medical Statistics for Big Data.

Why are electronic health record databases so useful for epidemiological research? 

Vast quantities of routinely collected health-related data are stored electronically as part of patient care. Some of these data are available in the form of anonymised databases for health care research.  

One such database is the Clinical Practice Research Datalink GOLD database which contains anonymised patient records, including patient demographics, health related measurements, smoking and alcohol histories, diagnoses and test results from 21 million patients registered at 986 GP practices in the United Kingdom.  

The scale and the richness of the data mean that electronic health record databases can be used to do studies that would otherwise be hard to power or that need lengthy periods of follow-up. Electronic health record databases can also be used for development of clinical prediction rules (see our Clinical Prediction Rules short course covering this specific topic).

Maternal anaemia and congenital heart disease in offspring 

We investigated whether maternal anaemia (low haemoglobin) in early pregnancy was associated with congenital heart disease in the offspring. Congenital heart disease affects about one percent of live births and despite modern surgery techniques, it is still a major cause of infant mortality and morbidity. 

We used data from CPRD GOLD, linked to data on diagnoses made in hospital (Hospital Episode Statistics) and data on cause of death. We used the CPRD Pregnancy Register to find pregnancy records and the CPRD mother-baby link to find the baby from each pregnancy. 

The study population consisted of mother-baby pairs where the mother had a haemoglobin measurement made in the first 100 days of pregnancy. We did a matched case-control study which means that for each mother-baby pair where the baby had congenital heart disease we looked for another five mother-baby pairs with pregnancy start date within six months, where the baby didn’t have congenital heart disease. 

In the 2776 mother-baby pairs where the baby had congenital heart disease, 123 (4.4%) of mothers had low haemoglobin. In the 13,880 matched mother-baby pairs where the baby didn’t have congenital heart disease only 390 (2.8%) of the mothers had low haemoglobin. 

We used conditional logistic regression adjusted for other potential confounders to get odds ratios for congenital heart disease in babies of mothers with low haemoglobin (< 110 g/L) compared to higher haemoglobin. We used the missing indicator method, or multiple imputation to deal with missing data.  

After adjusting for potential confounders, the odds of giving birth to a CHD‐diagnosed child were 47% higher among mothers with low haemoglobin (adjusted OR 1.47, 95% CI 1.18 to 1.83, p < 0.001). However, this preliminary evidence should be interpreted cautiously as we had considerable amounts of missing data for some of the confounding variables and others potential confounders weren’t available at all in GOLD. You can view the study here. 

What does the short course on Medical Statistics for Big Data cover? 

On this new course, students will learn some more advanced statistical methods that that are relevant to electronic health record studies. These will include Poisson regression for counts and rates, conditional methods for matched and self-controlled study designs and approaches to the analysis of interrupted time series. Students will be introduced to causal inference and propensity scores. The course will cover multiple imputation for missing data, including when to apply it and analysing imputed data. The course will emphasise practical application of these methods. 

Medical statistics academic tutoring Evidence-Based Health Care students

How does this course differ from the Big Data Epidemiology course? 

The Big Data Epidemiology course teaches students about how to design a study and how to manage data for a study that uses electronic health record data. However, the course does not include any in depth teaching or practicals on methods of statistical analysis. 

How can I apply? 

You can find out more information and how to apply on the Medical Statistics for Big Data course page. 

Who teaches on the course? 

Dr Margaret Smith is Deputy Director of the MSc in EBHC Medical Statistics and has been using electronic health record data for medical research since 2015. Dr Emily McFadden, a senior epidemiologist, and Richard Stevens, a Professor of Medical Statistics, have over 20 years experience providing expert advice to the MHRA (the United Kingdom equivalent of the FDA) on electronic health record data.  Francesca Little is Professor Emeritus at University of Cape Town and now leads MSc-level teaching modules in Oxford's Evidence Based Health Care programme.

References:

Nair M, Drakesmith CW, Smith M, Bankhead CR, Sparrow DB. Maternal Anaemia and Congenital Heart Disease in Offspring: A Case-Control Study Using Linked Electronic Health Records in the United Kingdom. BJOG. 2025 Jul;132(8):1139-1146. doi: 10.1111/1471-0528.18150. Epub 2025 Apr 23. PMID: 40264354; PMCID: PMC12137751.