In assessing non-inferiority trials the issues of trial design, such as randomisation, blinding and follow-up, are considered in the same way as they are in trials looking for superiority. However, there are other considerations when assessing non-inferiority trials. It can be difficult to judge if the statistical equivalence boundary has been appropriately set. There is scope for pharmaceutical companies to set the equivalence boundary too wide, making it easy to claim equivalence when it may not exist.
Traps
Proving that two drugs are equivalent could mean that they are both ineffective or even harmful. The evidence for the old drug must be considered when relying on an equivalence trial to show evidence for the efficacy of a new drug. If drug A is superior to placebo and drug B is proved non-inferior to drug A (and becomes the drug of choice because it is cheaper and easier to administer) but later drug C is proved to be non-inferior to drug B, can we be certain drug C is superior to placebo? This problem has been called 'biocreep' and could lead to the acceptance of progressively worse treatments if non-inferiority is blindly accepted. It can be avoided by selecting the most effective drug in the class as the control for non-inferiority trials, even if this is not the drug in most common use.
Can the data from a failed superiority trial be used to demonstrate non-inferiority (Fig. 2 – Trial A, B, C)? Can the data from a non-inferiority trial that goes particularly well be used to demonstrate superiority (Fig. 2 – Trial D)? These are controversial questions, however there is a view that if the non-inferiority boundary is selected a priori a failed superiority trial can be taken as evidence of non-inferiority, although the test for statistical significance should be adjusted for multiple comparisons.
Example 1
The RE-LY trial set out to demonstrate non-inferiority of dabigatran versus warfarin for preventing stroke in patients with atrial fibrillation.1
Choice of boundary
The non-inferiority boundary was chosen as a relative risk of 1.46 for stroke or systemic embolism. This boundary was derived on statistical grounds from a meta-analysis of trials of warfarin versus placebo and chosen as 50% of the proven benefit of warfarin. Although this may have satisfied the statisticians it is clearly not acceptable to clinicians that a new drug could allow 46% more strokes and still be regarded as non-inferior. As it turned out, dabigatran 110 mg dose reduced the relative risk to 0.91 (95% confidence interval 0.74–1.11). The upper boundary of an 11% increase in strokes is probably acceptable to clinicians and patients.
Analysis
An intention-to-treat analysis was performed. As 99.9% of the patients were followed up, loss to follow-up did not introduce bias. The proportions discontinuing treatment were 14.5% for the low dose and 15.5% for the high dose of dabigatran and 10.2% for warfarin, possibly biasing the relative risk towards 1.0. This could have given a spurious non-inferiority result if the point estimate had been a relative risk greater than 1.0, but would not have had this effect on a point estimate less than 1.0. A per protocol analysis was not done.
The trial set out to demonstrate non-inferiority, but ended up showing superiority of the 150 mg dose over warfarin with a relative risk of 0.66 (95% confidence interval 0.53–0.82) so the intention to treat analysis is appropriate for a claim of superiority (see Fig. 3). If the trial had claimed non-inferiority by showing the relative risk for stroke had a 95% confidence interval extending just short of the boundary (for example to 1.45), it should not have been accepted. To me the possibility that the new drug could lead to a 45% increased risk of stroke would be unacceptable.
Examples of equivalence and non-inferiority trials
Example 2
A study set out to show that once-daily dosing with mesalazine granules was as good as three-times-daily dosing at inducing remission during first episodes of ulcerative colitis. The rate of non-remission at eight weeks was 24.3% in the three-times-daily group, but only 20.9% in the once-daily group. The relative risk was 0.86 (95% confidence interval 0.59–1.25). The non-inferiority boundary was set at relative risk of 1.6, and as the upper limit of the confidence interval is clear of this, non-inferiority is accepted (see Fig. 3).2
Example 3
The Captopril Prevention Project compared the efficacy of the drug to older antihypertensives in the prevention of stroke, myocardial infarction and cardiovascular death. The authors presented both intention to treat and per protocol analyses, showing somewhat worse outcomes for captopril. The adjusted relative risk was 1.12 (95% confidence interval 0.94–1.32). The authors claimed equivalence, but did not pre-specify an equivalence boundary. Patients may not view the possible 32% increase in serious outcomes as equivalent (see Fig. 3).3