Predicting COVID-19 related death using the OpenSAFELY platform
Williamson EJ., Tazare J., Bhaskaran K., McDonald HI., Walker AJ., Tomlinson L., Wing K., Bacon S., Bates C., Curtis HJ., Forbes H., Minassian C., Morton CE., Nightingale E., Mehrkar A., Evans D., Nicholson BD., Leon D., Inglesby P., MacKenna B., Davies NG., DeVito NJ., Drysdale H., Cockburn J., Hulme W., Morley J., Douglas I., Rentsch CT., Mathur R., Wong A., Schultze A., Croker R., Parry J., Hester F., Harper S., Grieve R., Harrison DA., Steyerberg EW., Eggo RM., Diaz-Ordaz K., Keogh R., Evans SJW., Smeeth L., Goldacre B.
<jats:title>Abstract</jats:title><jats:sec><jats:title>Objectives</jats:title><jats:p>To compare approaches for obtaining relative and absolute estimates of risk of 28-day COVID-19 mortality for adults in the general population of England in the context of changing levels of circulating infection.</jats:p></jats:sec><jats:sec><jats:title>Design</jats:title><jats:p>Three designs were compared. (A) case-cohort which does not explicitly account for the time-changing prevalence of COVID-19 infection, (B) 28-day landmarking, a series of sequential overlapping sub-studies incorporating time-updating proxy measures of the prevalence of infection, and (C) daily landmarking. Regression models were fitted to predict 28-day COVID-19 mortality.</jats:p></jats:sec><jats:sec><jats:title>Setting</jats:title><jats:p>Working on behalf of NHS England, we used clinical data from adult patients from all regions of England held in the TPP SystmOne electronic health record system, linked to Office for National Statistics (ONS) mortality data, using the OpenSAFELY platform.</jats:p></jats:sec><jats:sec><jats:title>Participants</jats:title><jats:p>Eligible participants were adults aged 18 or over, registered at a general practice using TPP software on 1<jats:sup>st</jats:sup> March 2020 with recorded sex, postcode and ethnicity. 11,972,947 individuals were included, and 7,999 participants experienced a COVID-19 related death. The study period lasted 100 days, ending 8<jats:sup>th</jats:sup> June 2020.</jats:p></jats:sec><jats:sec><jats:title>Predictors</jats:title><jats:p>A range of demographic characteristics and comorbidities were used as potential predictors. Local infection prevalence was estimated with three proxies: modelled based on local prevalence and other key factors; rate of A&E COVID-19 related attendances; and rate of suspected COVID-19 cases in primary care.</jats:p></jats:sec><jats:sec><jats:title>Main outcome measures</jats:title><jats:p>COVID-19 related death.</jats:p></jats:sec><jats:sec><jats:title>Results</jats:title><jats:p>All models discriminated well between patients who did and did not experience COVID-19 related death, with C-statistics ranging from 0.92-0.94. Accurate estimates of absolute risk required data on local infection prevalence, with modelled estimates providing the best performance.</jats:p></jats:sec><jats:sec><jats:title>Conclusions</jats:title><jats:p>Reliable estimates of absolute risk need to incorporate changing local prevalence of infection. Simple models can provide very good discrimination and may simplify implementation of risk prediction tools in practice.</jats:p></jats:sec>