Tag Archives: randomized controlled trial

How to Interpret External Validity of a Randomized Controlled Trial

by Michael Turlik, DPM1

The Foot and Ankle Online Journal 2 (5): 5

The fourth and final article discussing the interpretation of randomized controlled trials for podiatric physicians involves the critical analysis of external validity. External validity is defined, examples from the foot and ankle literature are reviewed and a critical analysis is explained for the randomized controlled trial involving the use of magnetic insoles in the treatment of symptomatic diabetic neuropathy. This is part of an ongoing series of articles about evidence-based medicine to assist podiatric physicians develop the knowledge, skills and values to be successful in a changing practice environment.

Key words: Evidence-based medicine, external validity, randomized controlled trial

This is an Open Access article distributed under the terms of the Creative Commons Attribution License.  It permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. ©The Foot and Ankle Online Journal (www.faoj.org)

Accepted: April, 2009
Published: May, 2009

ISSN 1941-6806
doi: 10.3827/faoj.2009.0205.0005


When evaluating a randomized controlled trial (RCT) the first consideration is internal validity. Did the authors take appropriate measures to minimize bias? If the reader is satisfied that the authors did plan, implement and report methods which would minimize bias the reader would then evaluate the results of the trial. If the trial results revealed a clinically important difference between treatment arms the final question the reader will have to answer is; can I and should I apply the results of this study to my practice? This is a matter of judgment rather than statistical analysis. It is often referred to as relevance or external validity of the trial. This is the fourth and final article in this series of articles for podiatric physicians designed to help them understand, interpret and implement RCTs in their practices.

Types of Randomized Controlled Trials

Explanatory trials are characterized by having strict inclusion criteria, highly homogenous study groups, usually only present the results as per protocol and often use placebos as the comparator. Explanatory trials answer the question does this intervention work, this relates to the efficacy of the intervention. Pragmatic trials have increased variability of the participants, use an active comparator (gold standard) and evaluate the results as per protocol and intention to treat analysis. Pragmatic trials evaluate the effectiveness of an intervention; does this treatment work in real life? These are two ends of a continuum, the more pragmatic the trial is the better its external validity.

External validity or relevance involves the interpretation of the trial relative to the readers practice environment. Issues to be resolved by the podiatric physician when evaluating a trial for external validity are: trial participants, location of the study, intervention, outcomes and harms described. In general, RCTs are better suited for answering the question; Will this treatment work, rather than is it worth it?

Trial Participants

The first step for evaluating the trial participants is to review the inclusionary and exclusionary criteria listed by the author in the methods section of the article. For example, in a recent study [1] the authors looked at the use of duct tape in the treatment of pedal verruca in children 4-12 years of age. Can the results of this study be generalized to adults? In another recent study [2] of midportion Achilles tendon pain, all participants were confirmed by ultrasound prior to inclusion. Should the results be generalized to patients with insertional Achilles symptoms and should all patients be confirmed by ultrasound prior to beginning treatment as described in the study?

Are the participants in the trial similar with respect to age, gender, co morbidities and severity of disorder? It is easy to rationalize that the patient(s) in the podiatric physicians practice differs from those in the trial regarding some demographic aspect. “My patients are slightly older/younger, have more more/less co morbidities, take multiple/ less medications, speak/don’t speak English, wear the same /different shoes, have the same/different occupation.” Would my patient have been included in the trial? A good rule of thumb is that if the answer to this question is yes the results probably are relevant to your patient (s).

Location of the Study

In this classic study [3] the authors evaluated a clinical maneuver to detect infected bone in hospitalized patients with diabetic foot ulcers. Can the results be generalized to an office setting where the ulcers may be less severe than the participants in the study? Another study 4 looked at pulse versus continuous terbinafine for onychomycosis. The setting was a VA (What does this mean?) hospital in Minneapolis. Should the results be generalized to an upper income suburban private practice?

A common concern regarding study location is; are the results of a trial done in a tertiary care center by specialists relevant to primary care settings?

Another concern would be; are differences in the healthcare delivery system of other countries relevant to podiatric physicians practicing in the United States? The question the podiatric physician needs to answer is; was the setting of this study similar enough to my practice so that the results can be generalized?

Intervention

Is the intervention proposed in the study feasible for me? Am I technically proficient to perform the procedure? Is the device available to me? How much does the intervention cost and who will pay for it? All important considerations before deciding if the intervention is relevant to the individual podiatric physicians practice. The results of a randomized controlled trial may demonstrate a clinical benefit however; the intervention may not be relevant to the podiatric physicians practice.

The methods section of the paper should describe in enough detail the intervention and comparator to allow the reader to make an informed judgment. In a study5 of foot orthoses to treat plantar fasciitis the reader needs to determine if the method of casting and fabrication of the foot orthoses described are similar enough to his /her practice to allow for generalization of the results. A different study [6] evaluated the difference between surgery, foot orthoses and watchful waiting for symptomatic hallux valgus. In addition to the question again regarding foot orthoses, the reader has to decide if the procedure preformed is similar enough to the typical surgical procedure performed for hallux valgus in his/her practice. In an earlier article of this series [7] it was shown that Graftskin was a more effective treatment for diabetic foot ulcers when compared with moistened saline gauze dressings.8 The estimated cost for a single treatment with Graftskin varies from $1000 – $1200. [9] Despite the high cost of Graftskin a recent economic analysis demonstrated that for venous stasis ulcers Graftskin was more effective and less costly than treatment with unna boots. [10]

Outcomes

Is the outcome meaningful to my patients? Patient centered outcomes are the hallmark of evidence-based medicine. Validated patient centered outcomes are the most relevant to clinical practice. The authors should describe the primary outcome(s) in the methods section and reference validation efforts.

In a recent study [11] the authors looked at the use of Alendronate in acute Charcot arthropathy. The authors measured changes in the following to evaluate the success of the intervention: serum collagen COOH-terminal telopeptide of type 1 collagen (ICTP), osteocalcin, testosterone, estradiol, thyroid hormones, parathyroid hormone, follicle-stimulating hormone, leutinizing hormone, IGF-1, calcitriol, urinary hydroxyprolin, serum alkaline phosphatase. These are examples of surrogate outcomes. An important question to ask regarding surrogate outcomes is; is there a strong, independent, consistent association between the surrogate outcome and a patient-important outcome? [12]

In a trial comparing two different surgical procedures [13] for the correction of hallux abduto valgus deformity the authors used multiple outcomes. They consisted of: the American Orthopedic Foot and Ankle Society clinical rating scale for deformities of the hallux, EuroQol, visual analogue scale, and radiographic changes. Some of the outcomes are patient oriented (VAS, EuroQol) some are physician centered outcomes (range of motion, radiographic findings), some are validated others are not validated. Physician centered outcomes, and surrogate outcomes are not as relevant to clinical practice as patient centered outcomes. Composite outcomes may be important however, they need to be cautiously reviewed to ensure the results are not misleading. [14]

The question the podiatric physician needs to answer is; is the outcome used in this study patient centered, and validated? If the answer to this question is yes then the podiatric physician should feel comfortable in generalizing the results to his/her patient(s).

Harms

Were the benefits from the intervention in the trial worth the potential harms that occurred? RCTs present a less biased estimate of conceivable harmful effects than other study designs due to balancing of known and unknown prognostic factors. However, randomized controlled trials are unlikely to enroll enough participants to demonstrate rare and serious adverse effects which may be evident in the larger population. In addition, unlike reporting of efficacy in randomized controlled trials the reporting of harms is thought to be inadequate and incomplete. [15] Larger observational studies are usually better to assess harms from therapeutic interventions than RCT’s. In a study evaluating pharmaceutical versus non pharmaceutical trials in rheumatoid arthritis the authors found that pharmaceutical trials reported data about adverse effects on harms more often than non-pharmaceutical trials. [16] Fewer than half of the non pharmaceutical trials reported any harms.

Although not always reported in the methods section the authors should describe the method by which information on harms were collected and reported. Ideally, in the results section the author should report all of the adverse outcomes which occurred, the frequency with which they occurred and if important the time which they occurred. Consider the number of participants who withdrew from the trial due to adverse effects, and the completeness of the reporting of the adverse effects when evaluating a paper for harms.

In an article evaluating duct tape for pedal verruca [1] in children referenced earlier the authors briefly described in the methods section a passive questionnaire used to elicit information about adverse effects. Adverse effects are described both in the narrative section and a table in the results section. Three participants withdrew from the duct tape arm of the trial and no participants withdrew from the placebo arm of the trial due to adverse effects. Participants using the intervention demonstrated a larger percentage of localized adverse skin effects which were described as mild.

In another article referenced earlier [13] comparing two different surgical procedures for hallux abduto valgus deformity the author does not discuss the evaluation of harms in the methods section. However in the results the authors discuss complications occurring during the study which are limited to one surgical revision due to recurrence one year post operative. One participant developed an asymptomatic nonunion which resolved during the one-year follow-up period. Apparently none of the 100 participants developed a postoperative infection.

Was there a difference in harms between treatment arms? Was the difference statistically/clinically significant? Were there any serious adverse consequences noted between groups? Although not ideal for investigating harms any information on adverse effects presented in the paper should be used by the podiatric physician in deciding if a clinically significant result should be implemented in his or her practice. The information on harms should also be communicated to patients prior to beginning the intervention.

Use of magnetic insoles for diabetic peripheral neuropathy

The initial article of this series [17] identified a RCTs [18] which would assist us in evaluating the usefulness of magnetic insoles in the treatment of symptomatic diabetic neuropathy. The following two articles [7,19] critically evaluated the internal validity and results of the identified article. Using the information presented earlier in this article we will critically evaluate the external validity of the RCT evaluating magnetic insoles for diabetic neuropathy.

Trial Participants

The authors clearly explained the inclusionary and exclusionary criteria necessary for the participants to enter the trial. The entry criteria appear to be rather broad increasing the external validity of the trial.

The only concern would be whether or not magnetic insoles would be indicated as a first-line treatment since the participants were required to have symptoms which were constant and present over six months and which were refractory to various medications. [18]

Location of the Study

Three hundred and seventy five participants were recruited from 48 sites in 27 states. Participants were recruited from various specialties to include neurology, podiatry, diabetic clinics and other private practices. This multicenter study appears to have high external validity with regards to the location of the study.

Intervention

The authors provided a thorough technical description of the magnetic insole used in the study. The magnetic insoles used in the study are readily available over the Internet for less than $100 per pair. Whether and how these magnetic insoles differ from other commercially available magnetic insoles was not addressed in the study.

Outcomes

In the methods section the authors describe the use of two different instruments to measure four different primary outcomes. An 11 point visual analogue pain scale (0-10) was used to measure numbness and tingling/burning pain. A quality of life (QOL) instrument was used to measure foot pain and sleep interruption. Visual analogue scales and quality of life instruments are considered patient centered. In the methods section no references were cited regarding validation of the instruments furthermore, the quality of life instrument was not identified.

Harms

Under the methods section the authors stated that adverse effects would be monitored and reported. The adverse effects the authors were looking for included ulceration, abrasion, allergic reaction or infection. The authors reported in the results section no complications from either trial arm. However, in the flowchart of the trial reference was made to six participants in the intervention arm and four participants in the placebo arm experiencing complications.

Summary

The authors described in sufficient detail methods which would minimize bias with the exception of intention to treat analysis. The results of the study were not clearly presented comparing magnetic insoles and placebo. Using effect sizes calculated from the original data a small clinically significant effect was obtained with one of the four primary outcomes measured. The external validity of the study seems high with regards to trial participants, location of the study and the intervention. The reader will need to decide whether the cost of the intervention is worth the questionable efficacy of the intervention.

References

1. de Haen M, Spigt MG, van Uden CJT, van Neer P, Feron FJM, Knottnerus A: Efficacy of duct tape vs placebo in the treatment of verruca vulgaris (Warts) in primary school children. Arch Pediatr Adolesc Med 160: 1121 – 1125, 2006.
2. Petersen W, Welp R, Rosenbaum D: Chronic Achilles tendinopathy: A prospective randomized study comparing the theraputic effect of eccentric training, the AirHeel brace, and a combination of both. Am J Sports Med 35: 1659 – 1667, 2007.
3. Grayson ML, Gibbons GW, Baloh K, Levin E, Karchmer AW: Probing to bone in infected pedal ulcers: a clinical sign of underlying osteomyelitis in diabetic patients. JAMA 273 (9): 721 – 723, 1995.
4. Warshaw E, Fett D, Bloomfield H, Grill J, Nelson D, Quintero V, Carver S, Zielke G, Lederle F:. Pulse versus continuous terbinafine for onychomycosis: A randomized, double-blind, controlled trial Journal of the American Academy of Dermatology. 53 (4): 578 – 584, 2002.
5. Landorf K, Keenan A-M, Rushworth RL: Effectiveness of foot orthoses to treat plantar fasciitis. Arch Intern Med 166: 1305 – 1310 , 2006.
6. Torkki M, Malmivaara A, Seitsalo S, Hoikka V, Laippala P, Paavolainen P: Surgery vs. orthosis vs watchful waiting for hallux valgus: A randomized controlled trial. JAMA 285:2474 – 2480, 2001.
7. Turlik M. How to interpret the results of a randomized controlled trial. Foot and Ankle Online Journal. 2: 4, 2009.
8. Veves A, Falanga V, Armstrong DG, Sabolinski ML, Apligraf Diabetic Foot Ulcer Study: Graftskin, a human skin equivalent, is effective in the management of noninfected neuropathic diabetic foot ulcers. Diabetes Care 24: 290 – 295, 2001.
9. Hanft J, Williams A, Kyramarios C, Temar K: Are tissue replacements cost effective? Podiatry Today 16: 2003.
10. Schonfeld WH, Villa KF, Fastenau JM, Mazonson PD, Falanga V: An economic assessment of Apligraf® (Graftskin) for the treatment of hard-to-heal venous leg ulcers Wound Repair and Regeneration 8: 251 – 257, 2001.
11. Pitocco D, Pitocco, Ruotolo V, Caputo S, Mancini L, Collina CM, Manto A, Caradonna P, Ghirlanda G: Six-month treatment with alendronate in acute charcot neuroarthropathy. Diabetes Care 28: 1214 – 1215, 2005.
12. Guyatt G, Drummond R, Meade M, Cook D. Users’ guides to the medical literature. New York: McGraw-Hill Medical, 367 – 374, 2008.
13. Saro C, Andrén B, Wildemyr Z, Felländer-Tsai L: Outcome after distal metatarsal osteotomy for hallux valgus: A prospective randomized controlled trial of two methods. Foot and Ankle International 28: 778 – 787, 2007.
14. Montoti V, Montori VM, Busses JW, Miralda GP, Ferreira I, Guyatt GH: How should clinicians interpret results reflecting the effect of an intervention on composite end points: should I dump this lump? EBM 10: 162, 2005
15. Ioannidis JP, Evans SJ, Gotzsche PC, O’Neill RT, Altman DG, Schulz K, CONSORT Group: Better reporting of harms in randomized trials: An extension of the CONSORT statement. Annals of internal medicine 141: 781 – 788, 2004.
16. Ethgen M, Boutron I, Baron G, Ethgen M: Reporting of harm in randomized controlled trials of nonpharmacologic treatment for rheumatic d isease. Ann Intern Med 143: 20 – 25, 2005.
17. Turlik, M. Introduction To Evidence-based Medicine. Foot and Ankle Online Journal. 2:4,2009
18. Weintraub MI, Wolfe GI, Barohn RA, Cole SP, Parry GJ, Hayat G, Cohen JA, Page JC, Bromberg MB, Schwartz SL, Magnetic Research Group: Static magnetic field therapy for symptomatic diabetic neuropathy: a randomised, double-blind, placebo-controlled trial. Arch Phys Med Rehabil 84: 736 – 746, 2003.
19. Turlik M. Evaluating the internal validity of a randomized controlled trial. Foot and Ankle Online Journal. 2: 5, 2009.


Address correspondence to: Michael Turlik, DPM
Email: mat@evidencebasedpodiatricmedicine.com

1 Private practice, Macedonia, Ohio.

© The Foot and Ankle Online Journal, 2009

How to Interpret the Results of a Randomized Controlled Trial

by Michael Turlik, DPM1

The Foot and Ankle Online Journal 2 (4): 4

This is the third in a series of articles explaining basic concepts of evidence-based medicine to podiatric physicians. This article will explain how to evaluate the results section of a randomized controlled trial. This will be illustrated using actual examples from the foot and ankle literature to include an article which was referenced in the first article of this series.

Key words: Evidence-based medicine, randomized controlled trial

This is an Open Access article distributed under the terms of the Creative Commons Attribution License.  It permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. ©The Foot & Ankle Journal (www.faoj.org)

Accepted: March, 2009
Published: April, 2009

ISSN 1941-6806
doi: 10.3827/faoj.2009.0204.0004


The preceding article in the series dealt with the evaluation of internal validity of a randomized controlled trial (RCT). [1] The first step in critically analyzing a randomized controlled trial is to assess its internal validity. Once you feel comfortable that the authors have taken sufficient methods to minimize bias you will need to determine what the results of the study are. The authors should present the results of the study so that the reader can easily determine whether the intervention is effective, independent of the author’s interpretations. This is the third in a series of articles introducing podiatric physicians to evidence-based medicine (EBM).

Primary Outcome

The authors should specify a single primary outcome which is patient oriented and was chosen a priori to determine effectiveness/efficacy of the intervention. It has been reported that trial outcomes may be selectively reported post hoc [2], this is considered outcome reporting bias. One of the purposes of a trial registration is to prevent post hoc changes in the primary outcome [3] and minimize outcome reporting bias.

Hard outcomes such as ulcer closure or amputation are less common in the podiatric literature than soft outcomes such as functional improvement or pain relief. Hard outcomes usually occur or they do not occur and are considered binary or dichotomous data. Soft outcomes usually are treated as continuous measures even though they may be categorical. When measuring soft outcomes the authors should provide a reference which validates the instrument used and provide a clinically important difference.

In one review of orthopedic publications, the authors found that 31% of the publications modified a validated outcome instrument and then used the instrument without re-validating it. [4] Patient reported outcomes which are not validated may produce data which is biased (measurement bias). The primary outcome should be used in the calculation of the sample size to insure adequate power to detect a clinically important difference. Articles which report multiple primary outcomes should be viewed with caution because, of the increased incidence of type I errors due to multiple hypothesis testing [5] and lack of power. Surrogate outcomes [6], physician centered outcomes [7] and composite outcomes [8] should be considered with care. The practice of EBM stresses patient centered outcomes.

Secondary outcomes should be reported in the same manner as the primary outcome. They help to explain in a positive trial the intervention used. If the primary outcome is not statistically and clinically significant then the secondary outcomes are best utilized to develop further research questions. Subgroup analysis should be viewed with caution due to type I errors and should be considered hypothesis generating rather than hypothesis confirming. [9]

Magnitude and precision of treatment effect

The purpose of reviewing the results section of a RCT is to determine if the difference between the intervention and comparator is due to chance and if statistically significant is the result clinically important.
Large studies may produce statistically significant results which are so small as to be clinically insignificant. The authors should report for any outcome a magnitude of effect with 95% confidence limits.10 In the case of binary outcomes results should be presented as a 2 X 2 table, risks or (less desirable) odds. Using the information the reader can calculate the number needed to treat (NNT) with 95% CI if not reported. For continuous measures the author should report the difference between the intervention and comparator as a point estimate with 95% confidence intervals. This data is best presented in a table rather than text. When the authors present data other than described above it will be difficult to understand and place in a clinical context. Commonly a chi-squared test will be used to assess statistical significance for binary outcomes. For continuous measures common significance tests are t-test, ANOVA or ANCOVA. [11] If unusual statistical tests are employed the author should provide appropriate references for review.

Evaluation of soft outcome

Landorf [12] in assessing the effectiveness of three different types of foot orthotic devices in the treatment of mechanically induced heel pain chose The Foot Health Status Questionnaire for the primary outcome. The Foot Health Status Questionnaire is a validated instrument which is patient oriented and measures both pain and function. The outcome was selected a priori and analyzed using ANCOVA. The results of the primary outcome were reported in a table of the article, a portion of which are summarized. (Table 1)

Table 1 Comparison of foot orthotics.

The largest difference in pain at three months (Table 1) was shown between the prefabricated and sham device, the smallest difference was shown between the prefabricated and custom device. During the calculation for the sample size a 15 point improvement in pain was used. If we accept this as being the clinically significant improvement the authors were looking for none of the point estimates were clinically significant. In addition, since the lower end of the 95% confidence interval was below zero for all of the comparisons none of the results were statistically significant either. A question which should always be asked in a negative trial is, was there potentially a clinically significant outcome possible if more participants were enrolled in the study? This is evaluated by looking at the upper end of the 95% confidence interval to see if it is at or above a clinically significant level, in this case 15 points. If it is it indicates that if more participants were enrolled in the study this difference may become apparent. Since the upper end of the prefabricated versus custom foot orthotic device is below 15 points we can say with confidence that there is no difference between prefabricated and custom foot orthotic devices in the treatment of mechanically induced heel pain based upon the results of this study.

Evaluation of hard outcome

Veves [13] evaluated the use of Graftskin in the healing of diabetic non-infected neuropathic foot ulcers. The primary outcome was complete wound closure at 12 weeks.

The authors reported in the text that 63 of the 112 patients receiving graft skin healed by 12 weeks whereas, 36 of the 96 control patients healed by 12 weeks. They related that this was a statistically significant result (p = 0.0042). Using the data provided in the article a 2 x 2 table could be constructed. (Table 2)

Table 2 Comparison of ulcer resolution.

Control Event Rate (CER) = 62.5%, 

Experimental Event Rate (EER) = 43.75%, CER-EER = Absolute Risk Reduction (ARR) = 18.75, Number Needed to Treat (NNT) = 1/ARR = 6 (3-19)

Using an online calculator [14] and the information from the article the Number Needed to Treat (NNT) with 95% confidence intervals can be calculated. NNT is defined as the number of patients you need to treat to prevent one bad outcome. [15] NNT allows the results of the study to be placed in a clinician friendly metric which can be evaluated easily. The lower the number is the more effective the treatment. The number needed to treat is the inverse of absolute risk reduction (ARR). The absolute risk reduction equals control event rate (CER) minus experimental event rate (EER). Based upon the results (Table 2) of the study 6 people will need to be treated with Graftskin for 12 weeks in order to prevent one additional ulcer from failing to heal. The reader will need to decide if the costs and adverse effects justify the use of this product when compared to the control (external validity). Understand that this is only an estimate and that the number needed to treat may be as high as 19.

Magnets in the Treatment of Diabetic Neuropathy

In the first article of this series [16] a RCT [17] was found which evaluated the usefulness of static magnets for painful diabetic neuropathy. The second article [1] in the series evaluated the internal validity of this RCT. Using the information previously presented in this article the results of the article evaluating static magnets for the relief of painful diabetic neuropathy are summarized:

Primary outcome measure

The authors described four different primary outcome measures using two different outcome instruments: foot pain, sleep (Quality of Life [QOL]), burning pain, numbness and tingling (11 point Visual Analogue Scale [VAS]) which were analyzed once per month for four months. It wasn’t clear if they provided references related to validation of the outcome instruments. The authors describe a fairly complex method of statistical analysis of the extensive amount of data collected. The sample size calculation utilized a 17% difference between treatment arms for the calculation. It was unclear which of the primary outcomes were used for the 17% difference. Therefore, it is difficult to tell which of the outcome measures were adequately powered. It does not appear the trial was registered.

Magnitude and precision of treatment effect

The design of the study was a randomized controlled trial using static magnetic insoles compared with a placebo. The objective of the study as stated by the author’s was “to determine if constant wearing of multipolar, static magnetic (450G) shoe insoles can reduce neuropathic pain and quality of life (QOL) scores in symptomatic diabetic peripheral neuropathy (DPN)”. [17]

The author’s conclusion is “the present study provides convincing data confirming that the constant wearing of static, permanent, magnetic insoles produces statistically significant reduction of neuropathic pain”.

The authors presented their results for the primary outcomes in a table format. For each month and for each outcome measured they presented a mean and a standard deviation.

In the accompanying text the authors related that changes from baseline to various months were statistically significant for both magnets and control with all of the four primary outcomes at some point in time.

The authors did not present point estimates with 95% confidence intervals during any time period between magnets and placebo. It isn’t clear nor do they clearly state in the article that static magnetic insoles are more effective than placebo for the reduction of neuropathic pain seen in diabetic neuropathy. Are magnets more effective than placebo in the treatment of diabetic neuropathy?

Typically the authors at the conclusion of the trial using the original data would have used a statistical test to generate a p value and a point estimate with 95% confidence interval for a comparison between groups for the primary outcome. Since this data is not available to the reader the information in the article can be used to determine an effect size. Using the number of participants, means and standard deviations reported by the authors for the various outcome measures at four months an effect size between magnets and placebo can be calculated. Calculation of effect sizes is commonly seen in meta-analysis to determine treatment efficacy. There are different methods available to calculate an effect size, a common method to determine significance is to calculate an effect size as described by Cohen. [18] An effect size less than 0.2 is not consistent with a clinical/statistical significant effect. Greater than 0.2 is a small effect, greater than 0.5 is a moderate effect, greater than 0.8 is a large effect.

Although all of the measurements at four months (Table 3) for the various primary outcomes favor magnets, the only outcome at four months which shows a small clinically/statistically significant effect favoring magnetic insoles is numbness and tingling. The other three outcome measures are all less than 0.2 therefore, neither clinically nor statistically significant.

Table 3 Comparison using effect size.

*Effect Size (Cohen) = experimental group mean- control group mean / standard deviation of control group.

The upper limit of all of the primary outcomes is compatible with a small significant effect. Would this be apparent if more patients were enrolled? Was this study underpowered for any or all of the primary outcomes? Would the results of the study be different if the data were analyzed in an intention to treat analysis rather than using a per protocol analysis? In conclusion, based upon the data presented in the article it is unlikely that magnetic insoles have any effect in relieving symptomatic diabetic peripheral neuropathology.

References

1. Turlik M. How to Evaluate the Internal Validity of a Randomized Controlled Trial. The Foot and Ankle Journal, 2 (3) 5, 2009.
2. Chan A., et. al. Evaluating the Internal Validity of a Randomized Controlled Trials. JAMA 291: 2457 – 2465, 2004.
3. Trial Registration. http://clinicaltrials.gov/ him accessed 1/23/09.
4. Poolman RW, Struijs, Krips R, Sierevelt IN, Marti RK, Farrokhyar F, Zlowodzki M, Bhandari M. Reporting of Outcomes in Orthopaedic Randomized Trials:
Does Blinding of Outcome Assessors Matter? J Bone Joint Surg 89A: 550, 2007.
5. Austin P. Testing multiple statistical hypotheses resulted in spurious associations: a study of astrological signs and health. Journal of Clinical Epidemiology 59: 964, 2006
6. D’Agostino R. Debate: The slippery slope of surrogate outcomes. Curr Control Trials Cardiovasc Med 1: 76 – 78, 2000.
7. Martin R. Current Concepts Review: Foot and Ankle Outcome Instruments. Foot & Ankle International 27: 383 – 390, 2006
8. Montoti V. How should clinicians interpret results reflecting the effect of an intervention on composite end points: should I dump this lump? EBM 10: 162, 2005
9. Wang R, Lagakos SW, Ware JH, Hunter DJ, Drazen JM. Statistics in Medicine — Reporting of Subgroup Analyses in Clinical Trials. NEJM 357: 2189 – 2194, 2007.
10. Consort Statement. http://www.consort-statement.org/ accessed 1/23/09
11. Vickers AJ, Altman DG. Analysing controlled trials with baseline and follow up measurements. BMJ 323: 1123 – 1124, 2001.
12. Landorf KB, Keenan A-M., Herbert A. Effectiveness of Foot Orthoses to Treat Plantar Fasciitis. Archives internal medicine 166: 1305 – 1310, 2006.
13. Veves A, Armstrong DG, Sabolinski ML, and the Apligraf Diabetic Foot Ulcer Study. Graftskin, a Human Skin Equivalent, Is Effective in the Management of Noninfected Neuropathic Diabetic Foot Ulcers. Diabetes Care 24: 290 – 295, 2001.
14. Number needed to treat calculator http://www.graphpad.com/quickcalcs/NNT2.cfm accessed 1/23/09
15. McAlister F. The “number needed to treat” turns 20 — and continues to be used and misused. CMAJ 179: 549 – 553, 2008.
16. Turlik M. Introduction to Evidence-based Medicine. The Foot and Ankle Journal, 2 (2), 4, 2009.
17. Wientraub M, Wolfe GI, Barohn RA. Static Magnetic Field Therapy for Symptomatic Diabetic Neuropathy: A Randomized, Double-Blind, Placebo-Controlled Trial. Arch Phys Med Rehabil 84: 727, 2003.
18. Effect Size calculator http://davidmlane.com/hyperstat/effect_size.html accessed 1/23/09.


Address correspondence to: Michael Turlik, DPM
Email: mat@evidencebasedpodiatricmedicine.com

Private practice, Macedonia, Ohio.

© The Foot and Ankle Online Journal, 2009

Evaluating the Internal Validity of a Randomized Controlled Trial

by Michael Turlik, DPM1

The Foot and Ankle Online Journal 2 (3): 5

This paper discusses the important elements to look for when evaluating a randomized controlled trial for internal validity. At the end of the paper the article analyzes a randomized controlled trial found in the first article of this series. This is the second in a series of articles introducing practicing podiatric physicians to evidence-based medicine.

Key words: EBM, evidence-based medicine, randomized controlled trial

This is an Open Access article distributed under the terms of the Creative Commons Attribution License. It permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. ©The Foot and Ankle Online Journal (www.faoj.org)

Accepted: February, 2009
Published: March, 2009

ISSN 1941-6806
doi: 10.3827/faoj.2009.0203.0005


Is this treatment effective? The best study to answer this question is a randomized controlled trial (RCT). Once a RCT is found to answer the therapeutic question of interest the reader must decide whether or not the authors took all of the important steps to minimize bias. The strength of the inference that we can take from the trial is based upon how well the authors have planned, executed and reported safeguards to minimize bias in their study. The authors could reach an invalid conclusion to the question either because of bias or random error. The evaluation of the safeguards implemented by the authors is referred to as the internal validity of a RCT. Methodological quality is a continuum not a dichotomy and is often in the eye of the beholder.

Even if the authors have described in sufficient detail all of the methods which we would commonly associate with minimizing bias the results of the study may be affected by random error (chance), which is why we can never be sure exactly where the truth may lie. This is the second in a series of articles about evidence-based medicine (EBM) for practicing podiatrists. This article will help podiatric physician in evaluating the internal validity of a RCT.

Bias

Returning to the initial article of this series,1 you will recall that we had found two articles2,3 to help us in evaluating whether the use of insoles with magnets were an effective treatment for reducing pain from diabetic neuropathy. In the initial article of this series a reference was made to a PhD who advocated the use of magnetic insoles for the treatment of symptomatic diabetic neuropathy. In addition, she had developed and was marketing a new type of magnetic insole for this purpose.

The person presenting this information could be said to be biased toward the use of magnetic insoles specifically her design in the treatment of symptomatic diabetic neuropathy. Bias can be defined as an opinion or feeling that strongly favors one side in an argument or one item in a group or series; predisposition, prejudice.

However, when we consider bias in a RCT, we specifically refer to non-random systematic errors in the design or conduct of the study. Bias is described as any process at any stage of the study which produces results which deviate systematically from the truth. Bias in RCT’s is usually not intentional but it is pervasive and insidious. There are many specific types of bias associated with a RCT to include: selection, measurement, and analysis bias. The result of bias in a clinical study is to overestimate the treatment effect as a result, the intervention may appear to work when it really doesn’t. Methods in clinical trials which are used to minimize bias include the following: randomization, concealment allocation, blinding, and intention to treat analysis.

Randomization

Case reports and case series while popular in the podiatric literature cannot be used to demonstrate treatment efficacy/effectiveness. These study designs lack a control group and as a result are best used to generate rather than test hypotheses of clinical efficacy/effectiveness. Case-control and cohort studies are observational studies which utilize a control group however, the question always is: Are the two groups similar enough? Is the treatment effect seen in these studies due to the intervention or the difference in prognostic factors between the groups? It is well accepted that non randomized controlled trials exhibit a greater treatment effect than randomized controlled trials. [4] The only method currently available to provide two groups which are similar for known and unknown prognostic factors is randomization.

Selection bias is the term used to explain the preferential allocation of participants with similar prognostic factors to the same arm of the clinical trial. Random allocation is a study method which minimizes selection bias.

Random allocation can be defined as a process by which all participants have the same chance of being assigned to either treatment arm. One of the first things to evaluate when reading the methods section of a RCT is to determine if the authors of the study generated a randomized sequence consistent with the preceding definition. The following methods are not consistent with random allocation methods: date of birth, hospital chart number, alternate selection, date of entry, or days of the week. Unlikely methods of random allocation in clinical trials are: flipping a coin, rolling dice, or choosing colored balls out of a bag. Typically random allocation in clinical trials will be achieved by the use of a random number table or a computer-generated series of numbers. Another question which should be addressed in the methods section by the authors is who generated the randomization sequence? It is best when an independent third-party not associated with the study generates the sequence. In a study of RCT’s published in popular podiatric medical journals only two of the nine studies described a random allocation process. [5]

Concealment allocation

After the randomization sequence is generated the list may be given to the investigator responsible for enrolling participants in the study. This is referred to as unconcealed participant allocation. The investigator may steer participants to certain treatment arms based upon prognostic factors either consciously or unconsciously. Concealment allocation can be defined as the process by which the physician is blinded to the randomized sequence which was generated. The person who enrolls participants in the trial should not be the same person who generates the allocation sequence.

In RCTs where concealment allocation has not been utilized there is an over estimation of treatment effect compared to trials which conceal the allocation sequence. The increase treatment effect may be 20 to 30%. [6] The average bias associated with lack of adequate concealment allocation was less for outcomes which were objectively evaluated (death, ulcer closure) than subjectively evaluated (pain, patient reported outcomes).

The description the authors used for concealment allocation is usually found in the methods section of the paper. For example, a common description might be “…a neutral third party has generated a series of sequentially numbered opaque sealed envelopes (SNOSE) containing the randomization sequence for each participant to be opened at the time the investigator enrolls the participant in the study.” As a result the investigator is blinded to the treatment arm to which the participant will be enrolled.

It is very common today to have a centralized allocation process when the investigator enrolls a participant in the study he or she will call an off-site location to determine the next group to which the subject is enrolled. A similar system is also used by computer over the internet. Sometimes the allocation sequence may be kept in the pharmacy which will be contacted by the investigator prior to enrolling the participant. Adequate concealment allocation helps to limit selection bias. In a study of RCTs published in common podiatric medical journals none of the nine studies described a process for concealment allocation. [5]

Were patients in the treatment and control group similar with respect to known prognostic factors? Problems with the randomization allocation sequence and the concealment allocation process may result in an imbalance in baseline prognostic factors. If both of these steps have been followed and there still remains an imbalance in some important prognostic factor it will be assumed that this is due to chance rather than bias. The larger the study the less likely this is to occur.

Blinding

Blinding in a clinical trial can be defined as withholding information about treatment allocation from those who could potentially be influenced by this information. Un-blinded studies exhibit an increased treatment effect compared blinded studies. [7] In the methods section the authors should describe in some detail who was blinded, how they were blinded, and the success of blinding.

Who was blinded? Certainly participants and investigators can be blinded. Less commonly recognized is that data collectors and analysts should be blinded. Participants should be blinded because they may use other effective interventions, may report symptoms differently, or may drop out if they perceived they have received a placebo therapy. Investigators should be blinded because they may prescribe effective co-interventions, influence follow-up, or patient reporting. Data collectors and analysts should be blinded because they may exhibit differential encouragement during performance testing, exhibit variable recordings of outcomes, or differential timing and frequency of outcome measurements.

How was blinding achieved? In the case of medication the placebo should be the same size, shape, color and taste as the therapeutic intervention. When using a gold standard a double dummy process should be employed. When using a sham procedure both instruments should look alike, sound alike, have the same lights and duration. Separate waiting rooms may be necessary for each treatment arm to prevent interactions between groups. Sometimes the therapeutic intervention under investigation precludes the investigators and/or participants from being blinded however, it is difficult to understand why data collectors and analysts cannot be blinded.

There is no universal agreement upon how to assess blinding8 or even if it should be assessed. Often times study authors will ask investigators and participants to guess at their treatment allocation and report the results. Some would suggest looking for bias generating consequences instead of contamination and co-interventions.

Measurement bias is defined as inaccurate measurement due to either the accuracy in the measurement instrument or bias based upon study expectations of participants and investigators. Blinding will help to limit measurement bias. In a study of RCT’s published in popular podiatric journals only two of the nine trials described a process for blinding. [5]

Intention to Treat Analysis

Intention to treat (ITT) analysis can be defined as the strategy for the analysis of randomized controlled trials to ensure all participants are compared in the groups to which they were originally randomly assigned. Although this sounds simple it is difficult to understand and often confused in the literature. In general, all patients who were randomized at the beginning of the trial must be accounted for during the analysis. Certainly there are some exceptions [9] however, failure to account for all the participants at the conclusion of the trial will result in analysis bias, overestimating the treatment effect. ITT preserves the prognostic balance in the treatment arms achieved by randomization and increases generalizability.

During the course of the study participants may elect not to participate after they were randomized or change treatment arms for various reasons. In ITT analysis they should always be analyzed into the group they were originally allocated regardless of the treatment received. In addition, during the course of the study participants may be lost to follow-up.

It is well-established that participants who drop out of a study have a different prognosis than those who remain. [10] There are many different methods to infer or impute lost study results for example, last value carried forward, or worse case scenario. Unfortunately these are only estimates for missing data. ITT prevents conscious or unconscious attempts to influence study results by excluding participants.

Results in a RCT may be presented in two different manners. Per protocol indicating that the results only include patients who have successfully completed the trial. ITT analysis will be reported indicating that all participants are accounted for. Per protocol analysis answers the question what will happen if my patients all comply with the treatment intervention (explanatory). Intention to treat analysis answers the question what will happen in real life using this treatment intervention (pragmatic). If there is a difference between per protocol analysis and ITT analysis, the loss of follow-up has been large and the inference from the study is reduced. Intention to treat analysis is a more pragmatic and a more conservative estimate of treatment effect and minimizes a type I reporting error.

When reviewing the methods section of a RCT to evaluate intention to treat analysis the reader should look for how intention to treat was performed. It is unacceptable for the authors simply to state that the data was analyzed on an intention to treat principle without explanation. In a study of RCT’s published in common podiatric journals four of the nine papers reported that the data was analyzed on an intention to treat basis. [5]

Magnets in the treatment of diabetic neuropathy

In the first article of this series [1], a RCT3 was identified which evaluated the usefulness of magnetic insoles in reducing the symptoms of painful diabetic neuropathy. Using the information presented earlier the internal validity of this article was critically analyzed.

Randomization: The authors in the methods section describe an equal random allocation procedure utilizing a computer. It is unclear who actually generated the sequence and how it was accomplished. In addition, randomization was stratified by center and gender.

Concealment allocation: In the methods section the authors report that neither the participants or investigators were aware of the treatment allocation. However, they did not elaborate as to the method. It may be that utilizing the computer a centralized allocation process was used.

Baseline comparison: Although the active treatment group and the sham group appear to be similar with regards to baseline characteristics, the groups appear to be dissimilar when baseline outcome measurements are compared. The baseline outcome measures for the sham group appear to be worse than the intervention group.

Blinding: The authors state that the sham and active magnetic insoles were identical with regards to the appearance, consistency and weight. In the event the insole did not fit the shoe and needed to be trimmed the authors described a process by which an uninvolved third party would make the adjustments. In addition, all data was submitted blindly to an uninvolved third-party for analysis. In the results section the authors report their efforts on assessing the effectiveness of the methods used in blinding the investigators and participants. There was no indication that any contamination or co-interventions were detected. However, in the methods section there was no indication that the authors were attempting to measure contamination or co-interventions.

Intention to treat analysis: The results of the study were not analyzed on an ITT basis. Furthermore, the author states that participants with incomplete data were excluded from analysis.
In the results section the authors discussed dropouts and their decision not to analyze the data on an intention to treat basis. The authors chose to use four different primary outcomes each with different numbers of participants.

Summary of internal validity: Based upon the methods and results section of the paper it is clear that the authors attempted and succeeded in blinding participants, investigators and data analysts. It is less clear as to the method of random allocation and concealment allocation. The authors made no attempt to analyze the data utilizing the intention to treat principle.

References

1. Turlik M: Introduction To Evidence-based Medicine. Foot and Ankle Journal 2: 4, 2009.
2. Pittler M, Brown EM, Ernst E: Static magnets for reducing pain: systematic review and meta-analysis of randomized trials. CMAJ 177: 736 – 742, 2007.
3. Weintraub MI, Michael I. Weintraub, Wolfe GI, Barohn RA, Cole SP, Parry GJ, Hayat G, Cohen JA, Page JC, Bromberg MB, Schwartz SL, Magnetic Research Group: Static magnetic field therapy for symptomatic diabetic neuropathy: a randomised, double-blind, placebo-controlled trial. Arch Phys Med Rehabil 84 (5): 736 – 746, 2003.
4. Moore A, McQuay H: Bandolier’s Little Book of Making Sense of the Medical Evidence. Oxford University press, Oxford England, 2006.
5. Turlik M., Kushner D, Stock D: Assessing the Validity of Published Randomized Controlled Trials in Podiatric Medical Journals. JAPMA 93 (5): 392 – 398, 2003.
6. Wood L: Empirical evidence of bias in treatment effect estimates in controlled trials with different meta-epidemiological study interventions and outcomes BMJ 336 : 601 – 605, 2008.
7. Poolman R: Reporting of Outcomes in Orthopaedic Randomized Trials: Does Blinding of Outcome Assessors Matter? J Bone Joint Surg 89A: 550 – 558, 2007.
8. Hro´bjartsson A. trials taken to the test: an analysis of randomized clinical trials that report tests for the success of blinding. International Journal of Epidemiology 36 (3): 654 – 663, 2007.
9. Fergusson D, Aaron SD, Guyatt G, Hébert P: Post-randomisation exclusions: the intention to treat principle and excluding patients from analysis. BMJ 325: 652 – 654, 2002.
10. Altman D: Clinical trials. In: Practical statistics for medical research. London: Chapman & Hall, 1991.


 Address correspondence to: Michael Turlik, DPM
Email: mat@evidencebasedpodiatricmedicine.com

1 Private practice, Macedonia, Ohio.

© The Foot and Ankle Online Journal, 2009