Validity and timeliness of cancer diagnosis data collected during a prospective cohort study and reported by the English and Welsh cancer registries: a retrospective, comparative analysis
Jackson A., Virdee PS., Tonner S., Oke JL., Perera R., Riahi K., Luan Y., Hiom S., Kumar H., Nandani H., Kurtzman KN., Huws D., Allan D., Smits S., McPhail S., Parkes EE., Hobbs FDR., Middleton MR., Nicholson BD.
Background: Cancer places a high burden on society and health-care systems. Cancer research requires high-quality data, which is resource-intensive to obtain. Using administrative datasets such as cancer registries could improve the efficiency of cancer studies if data were valid and timely. We aimed to compare the validity and timeliness of diagnostic cancer data on-site during the SYMPLIFY study to that obtained from the cancer registries of England and Wales. Methods: Cancer data were collected from 5461 participants across 44 hospital sites during a prospective observational study in England and Wales, SYMPLIFY (ISRCTN10226380). Linked cancer data were obtained from Digital Health and Care Wales (DHCW), the Welsh Cancer Intelligence and Surveillance Unit (WCISU), and the English National Cancer Registration Dataset (NCRD) and Rapid Cancer Registration Dataset (RCRD), regularly between April, 2022, and September, 2023. The primary objectives of the study were to evaluate the validity (via assessment of the proportion of completed data fields and concordance with SYMPLIFY sites), and timeliness of the data in all datasets, for all cancers diagnosed within 9 months of study enrolment. Data fields investigated were cancer site via International Classification of Disease, 10th Revision (ICD-10) code; cancer morphology via International Classification of Diseases for Oncology, 3rd Edition (ICD-O-3) morphology histology code and broad morphological grouping; overall stage; and TNM classification. Findings: For data collected between April, 2022, and September, 2023, completeness at the last data cut available for each dataset ranged from 84% to 100% for ICD-O-3 morphology, from 43% to 100% for overall stage, and from 74% to 83% for TNM stage. The concordance between SYMPLIFY data and NCRD was 96% (95% CI 92–98) for ICD-10, 60% (53–66) for ICD-O-3 morphology, 83% (78–88) for ICD-O-3 broad morphology groupings, 73% (67–78) for stage, and 51% (44–59) for TNM; and with WCISU was 89% (95% CI 81–94) for ICD-10, 63% (53–73) for ICD-O-3 morphology, 80% (70–87) for ICD-O-3 broad morphology groupings, 83% (74–90) for overall stage, and 49% (38–61) for TNM stage. Concordance between SYMPLIFY and RCRD was 95% (95% CI 92–98) for ICD-10, 67% (60–74) for ICD-O-3 morphology, 85% (79–90) for ICD-O-3 broad morphology groupings, and 73% (65–80) for overall stage; and between SYMPLIFY and DHCW was 96% (91–99) for ICD-10, 74% (64–83) for ICD-O-3 morphology, 84% (75–91) for ICD-O-3 broad morphology groupings, and 87% (74–95) for stage. The SYMPLIFY dataset reached completion at 12 months post-enrolment in November, 2022, compared with 13 months for NCRD in December, 2023. RCRD and DHCW reached completion at 13 months and 15 months post-enrolment, in December, 2022, and February, 2023, respectively. Interpretation: We report similar completeness of data fields, concordance, and timeliness between on-site and centrally collected cancer outcomes data. Our findings suggest that central registry data can help alleviate the resource burden in clinical trials and improve cancer research. Cancer registries might need additional resources to provide data for registry-based trials at scale. Funding: GRAIL Bio UK.