Assessing the feasibility and impact of clinical trial trustworthiness checks via an application to Cochrane Reviews: Stage 2 of the INSPECT-SR project
Wilkinson J., Heal C., Antoniou GA., Flemyng E., Ahnström L., Alteri A., Avenell A., Barker TH., Borg DN., Brown NJL., Buhmann R., Calvache JA., Carlsson R., Carter LA., Cashin AG., Cotterill S., Färnqvist K., Ferraro MC., Grohmann S., Gurrin LC., Hayden JA., Hunter KE., Hyltse N., Jung L., Krishan A., Laporte S., Lasserson TJ., Laursen DRT., Lensen S., Li W., Li T., Liu J., Locher C., Lu Z., Lundh A., Marsden A., Meyerowitz-Katz G., Mol BW., Munn Z., Naudet F., Nunan D., O'Connell NE., Olsson N., Parker L., Patetsini E., Redman B., Rhodes S., Richardson R., Ringsten M., Rogozińska E., Seidler AL., Sheldrick K., Stocking K., Sydenham E., Thomas H., Tsokani S., Vinatier C., Vorland CJ., Wang R., Al Wattar BH., Weber F., Weibel S., van Wely M., Xu C., Bero L., Kirkham JJ.
Background and Objectives: The aim of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews (INSPECT-SR) project is to develop a tool to identify problematic RCTs in systematic reviews. In stage 1 of the project, a list of potential trustworthiness checks was created. The checks on this list must be evaluated to determine which should be included in the INSPECT-SR tool. Methods: We attempted to apply 72 trustworthiness checks to randomized controlled trials (RCTs) in 50 Cochrane reviews. For each, we recorded whether the check was passed, failed, or possibly failed or whether it was not feasible to complete the check. Following application of the checks, we recorded whether we had concerns about the authenticity of each RCT. We repeated each meta-analysis after removing RCTs flagged by each check and again after removing RCTs where we had concerns about authenticity to estimate the impact of trustworthiness assessment. Trustworthiness assessments were compared to Risk of Bias and Grading of Recommendations Assessment, Development and Evaluation (GRADE) assessments in the reviews. Results: Ninety-five RCTs were assessed. Following application of the checks, assessors had some or serious concerns about the authenticity of 25% and 6% of the RCTs, respectively. Removing RCTs with either some or serious concerns resulted in 22% of meta-analyses having no remaining RCTs. However, many checks proved difficult to understand or implement, which may have led to unwarranted skepticism in some instances. Furthermore, we restricted assessment to meta-analyses with no more than five RCTs (54% contained only 1 RCT), which will distort the impact on results. No relationship was identified between trustworthiness assessment and Risk of Bias or GRADE. Conclusion: This study supports the case for routine trustworthiness assessment in systematic reviews, as problematic studies do not appear to be flagged by Risk of Bias assessment. The study produced evidence on the feasibility and impact of trustworthiness checks. These results will be used, in conjunction with those from a subsequent Delphi process, to determine which checks should be included in the INSPECT-SR tool. Plain Language Summary: Systematic reviews collate evidence from randomized controlled trials (RCTs) to find out whether health interventions are safe and effective. However, it is now recognized that the findings of some RCTs are not genuine, and some of these studies appear to have been fabricated. Various checks for these “problematic” RCTs have been proposed, but it is necessary to evaluate these checks to find out which are useful and which are feasible. We applied a comprehensive list of “trustworthiness checks” to 95 RCTs in 50 systematic reviews to learn more about them and to see how often performing the checks would lead us to classify RCTs as being potentially inauthentic. We found that applying the checks led to concerns about the authenticity of around 1 in three RCTs. However, we found that many of the checks were difficult to perform and could have been misinterpreted. This might have led us to be overly skeptical in some cases. The findings from this study will be used, alongside other evidence, to decide which of these checks should be performed routinely to try to identify problematic RCTs, to stop them from being mistaken for genuine studies and potentially being used to inform health care decisions.