Is the ‘best’ really the ‘best’? Chris Banks-Pillar recently completed his MSc in Evidence-Based Health Care focusing on the conundrum of whether the ‘best’ evidence really is the 'best’.
1 August 2023
EBHC programmes Students
Chris Banks-Pillar shares his journey interrogating the integrity of evidence around exercise-based interventions for treating non-specific chronic lower back pain (NSCLBP) for his MSc in Evidence-Based Health Care dissertation.
You’ve run the gauntlet of modules but there’s just one hurdle left, the dissertation project. This is your chance to use your newfound EBM superpowers to produce meaningful data and discussions around an important clinical issue. After bouncing ideas around with my tutor, I decided to focus on the effectiveness of exercise-based interventions used to treat non-specific chronic lower back pain (NSCLBP).
Back pain is bread and butter for physios, so I suppose it’s an obvious choice. Over the years I've seen many interventions and advice go in and out of favour. Most provide good short-term outcomes but do not prevent or manage longer-term issues while others are just a bit crazy. What’s certain is that you can guarantee everything will come back around in time, regardless of how crazy they seem.
Exercise is one of those interventions that has stuck around. I’ve had great success using exercise to manage NSCLBP. Patients, once convinced of the long-range benefits, quickly see the improvements and use it to self-manage their symptoms. Ironically, in some patient cohorts, I’ve had more difficulties stopping the patient from doing too much too soon.
Exercise-based interventions have recently made the leap from academic literature and clinical knowledge to the public’s psyche via the mainstream media. They have gained traction in the press which now regularly report on developments in lower back pain management:
The authors mean well but they haven’t really told us anything we don’t already know. What I really wanted to know was whether these recommendations were produced using a rigorous methodological process. Can we trust the recommendations? Does the ‘best’ academic article tell the reader which form of exercise, and in what dose, produces the largest effect size? I decided to use these questions to form the following PICOT for the project:
P: adults, at or older than 18 years of age, with clinically diagnosed non-specific chronic lower back pain (NSCLBP). This is pain of an indeterminable pathoanatomical cause located inferior to the costal margin and above the inferior gluteal folds that lasts for longer than 12 weeks. It can be accompanied either with or without radicular pain (i.e., neurological pain radiating from the lower back).
I: Exercise interventions as monotherapy or part of a combined therapy were included in this review as they are commonly used in clinical practice to treat NSCLBP.
C: Any non-pharmacological comparators were included in this review. These included activity or educational-based interventions, and true controls involving no treatment.
O (primary): Validated Patient Reported Outcomes Measures (PROM) such as Visual Analogue Scale (VAS), Pain Numerical Rating (PNR), and the McGill Pain Questionnaire (MPQ)
O (secondary): functional status or disability such as the Oswestry Disability Index (ODI), Roland Morris Disability Questionnaire (RMDQ), Short Form-36 (v2) Health Survey (SF-36), Quebec Back Pain Disability Scale (QBPDS), and the Multidimensional Pain Inventory (MPI).
T: This review only included systematic reviews with Network Meta-Analysis (NMA). The effect estimates produced by NMA are usually considered more precise and allow interventions to be ranked in order of effectiveness.
I used the following set of objectives to answer my research question:
Assess the reporting and methodological quality of the existing systematic reviews with network meta-analyses.
The review will select the highest quality SRNMA and assess the quality of the underlying primary evidence base for an exemplar exercise intervention.
Identify the focus for further research in this field.
I then used the following methods to achieve those objectives:
I applied The Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA-NMA) Statement, ‘A MeaSurement Tool to Assess systematic Reviews 2’ (AMSTAR 2) and The International Society for Pharmacoeconomics and Outcomes Research (ISPOR) NMA appraisal tool to determine which review had the highest reporting and methodological quality.
I identified the primary studies of the exemplar intervention at the most effective outcome and time point included in the SRNMA with the highest overall methodological quality. From these I selected a majority sample which the chosen review considered to be the highest quality studies. I applied the RoB 1 tool to these studies to assess the author’s judgements and identify whether these were consistently applied in accordance with the RoB 1 guidance.
I extracted the data from these studies to check for accuracy before conducting new pairwise meta-analysis for each outcome.
I assessed for small study effects using:
Funnel Plot asymmetry
Egger’s regression test
The Risk of Bias in Meta-analysis (RoBMA) package to assess presence of publication bias.
Finally, I assessed the applicability of the intervention in the trials where there was RoB agreement between the chosen review and this review using the CERT tool.
Three SRNMA were included in the review (337 RCTs, 33,660 participants, 24 interventions). One achieved a ‘moderate overall confidence’ AMSTAR-2 rating but had two partial critical weaknesses. The same review reported the most complete PRISMA-NMA (29 of 32) and IPSOR (9 of 15) items. The remaining SRNMAs achieved ‘critically low overall confidence’ AMSTAR-2 ratings. These tools highlighted the poor standard of reporting across all the reviews, especially in the methods and results sections. Perhaps more importantly, none of the articles justified why they had conducted a meta-analysis given the significant degree of heterogeneity. My confidence in their results was low at best.
The ‘best’ of these reviews reported that Pilates was the most clinically effective intervention at reducing pain (10 RCTs, 887 participants) in the short-term (6-12 weeks). Two of 10 included RCTs did not fully meet the SRNMA inclusion criteria so they were excluded from later reanalysis. Data extraction errors were found in six of the remaining eight RCTs and two RCTs had both data-extraction errors and a co-intervention bias. A corrected pairwise MA of RCTs without data errors and co-interventions showed a markedly reduced effect estimate (-2.36 95%CI[-2.88, 1.84], p=0.34, I²=8%) that was 9 times smaller when compared with the uncorrected MA (-21.48 95%CI[-35.11, -7.84], p=<0.00001, I²=92%) and significant differences were observed for other re-analyses. The Funnel Plot asymmetry and Egger’s regression test showed a lack of publication bias. Conversely the Risk of Bias in Meta-analysis (RoBMA) package inferred a high degree of uncertainty (PP=0.997) and extreme evidence (IBF=334.568) it exists.
Let’s cut to the chase; even if we believe the clinical effectiveness of the reported result and ignore whether there is publication bias etc, can we implement the intervention? In short, no. The CERT assessment found that the two highest quality trials included in the ‘best’ review only reported 2 to 3 of the 15 items in the checklist completely. Neither of the trials reported the basic details of how to complete the intervention in terms of sets, reps, duration or intensity. Without this information the clinician may as well just ask the patient to move more and sit less. In fact, the authors of the best review did just that. They concluded that ‘people with chronic low back pain should be encouraged to perform the exercise that they enjoy promoting adherence’. Great, all that work for a recommendation we already knew!
Multi-national review teams are in a race to find the best exercise intervention for NSCLBP. These teams are using complex statistical methods commonly used in pharmacological trials to find the most effective intervention. But we must remember that SRNMAs are only as good as the underpinning data and the rigour of the review methods used to appraise that data. If these foundations are flawed, then so are the conclusions.
Many of the issues found in my dissertation project could be mitigated if trialists were made aware of the integrity of the foundations. Without these strong foundations the validity of their results is compromised by over-estimated data quality and effect estimates that don’t hold together when scrutinised.
Fortunately, I have the skills to interrogate the data and find the cracks in the foundations. Many key decision-makers either don’t have the time or the skills to dig into the data. Some might be impressed by the high-quality academic veneer the literature appears to have and reasonably conclude that an investment into this intervention will yield that reported result.
When decisions are made on such large scales - such as those needed to make meaningful change to prevalent health problems - the evidence informing these decisions needs to be held to a high standard. My project showed these standards are paid lip service but not currently adhered to.
However, perhaps there’s an important point here. If we insist on elucidating which is the best form of exercise to treat NSCLBP, reviewers and readers need to be able to find flawed data more easily. This may help the development of critical appraisal tools which can be applied easily, quickly and consistently. Some of the tools I used took over an hour to apply and by the end I really questioned whether I knew enough to apply them.
Alternatively, it may be worth stepping back and wondering if we are asking the correct question. Our resources should be focused on finding what is preventing the public from exercising more and whether the public then take on that exercise and integrate it into their daily lives. What’s for sure is that just because we’ve found the ‘best’ intervention doesn’t mean anyone will do it.
So, what’s next? I started my DPhil in January which has taken up the mantle of my dissertation. The DPhil aims to investigate how we can optimise the uptake of activity-based interventions in social prescribing. Social prescribing has been around under various guises since the early 80s. It’s more recently that social prescribing is really gathering momentum as clinicians and patients look for a more holistic approach to treating highly complex patients.
In March I attended the National Social Prescribing conference in London. The atmosphere and enthusiasm were great (any conference that holds a mass dance session during the lunch break gets my vote) but I came away with so many questions and concerns. We know that exercise is good for the patient and the downstream services, but we shouldn’t get too carried away. Many of the conversations about the effects of exercise barely had the word evidence let alone the quality of that evidence. Exercise is great on many levels but if it's to be implemented effectively it needs to be backed up with cold hard facts.
Addressing these concerns will mean better longer-term outcomes for patients. Ensuring that what we implement in practice is tested by the most rigorous EBM processes will encourage researchers to be transparent about their methods and reassure clinicians and patients of the intervention’s effectiveness. If we fail to be brutally honest about the outcomes and how we’ve achieved them, we could be doing exercise a disservice.