Tag Archives: evidence-based medicine

Critical Evaluation of Diagnostic Review Article

by Michael Turlik, DPM1

The Foot and Ankle Online Journal 2 (12): 5

This is the third in a series of articles for podiatric physicians discussing the evaluation of a diagnostic article. The author contrasts and compares the critical evaluation of a diagnostic review article to a therapeutic review article utilizing two recent diagnostic publications from the foot and ankle literature.

Key words: Evidence-based medicine.

Accepted: November, 2009
Published: December, 2009

ISSN 1941-6806
doi: 10.3827/faoj.2009.0212.0005


The treatment effect of a therapeutic intervention found in a randomized controlled trial (RCT) is enhanced when multiple studies can be combined quantitatively for a more precise estimate of effect. So too, can the diagnostic accuracy of an index test be enhanced if multiple studies can be pooled into a single diagnostic measurement. If the diagnostic meta-analysis is composed of cross sectional studies of an independent, masked comparison with a reference standard among an appropriate population of consecutive patients the meta-analysis can be considered level I evidence. Like therapeutic interventions this is accomplished through a systematic review/meta-analysis. When evaluating a systematic review/meta-analysis of diagnostic studies many of the same principles apply to the design, conduct and reporting of the study as discussed in an earlier article [1] describing the critical analysis of a systematic review/meta-analysis of therapeutic interventions. Steps in performing a systematic review / meta-analysis for diagnostic tests are the same (Table 1).

Table 1 Steps in systematic review/meta-analysis.

Although therapeutic and diagnostic meta-analysis shares many features, they differ in several important ways. The method to determine trial quality differs greatly between the two types of studies. In addition, the method by which heterogeneity and statistical pooling also differ significantly. The ways in which they differ will be the basis for this instructional monograph on how to critically evaluate a systematic review/meta-analysis of diagnostic studies. Two diagnostic systematic review/meta-analysis studies [2,3] will be compared and contrasted to illustrate the principles described in this instructional monograph. Both articles evaluate the use of diagnostic imaging techniques in the evaluation of osteomyelitis of the foot. As discussed in an earlier article [4] in this series the evaluation of infected diabetic foot ulcers often requires additional diagnostic studies to evaluate for the presence or absence of osteomyelitis. It is important for the podiatric physician to accept that the results of a systematic review/meta-analysis is a function of the validity of the methods used in the study and the quality of the primary studies included.

Formulation of the foreground question

The basics of question development have been covered elsewhere. [1] The foreground question for a diagnostic meta-analysis should define the population/disease of interest, the index test, the reference standard, and the outcome of interest.

Both articles [2,3] used as examples in this paper propose a foreground question to initiate the meta-analysis. In addition, Dinh [3] assesses clinical examination as well as, imaging modalities in the evaluation of foot osteomyelitis. Dinh [3] also more narrowly defines the population studied to those with infected diabetic foot ulcers.

Inclusion/exclusion criteria

An explanation of inclusion and exclusion criteria are provided elsewherere.1 Inclusion criteria for diagnostic studies should include the index test, reference standard, population studied, prevalence and outcome data to be abstracted.

Kapoor [2] clearly stated that the index test was magnetic resonance imaging (MRI), the comparators were bone scanning and radiographs, the population included all adults suspected of having osteomyelitis and data in the article need to contain information to construct a 2 x 2 diagnostic table. Dinh [3] described the inclusion criteria to consist of all diagnostic and clinical studies to evaluate diabetic foot ulcers for osteomyelitis. Both studies used bone biopsy as the reference standard. Neither described a particular setting for these studies nor was exclusionary criteria specified. However, Dinh3 would supply exclusionary criteria if requested by the reader.

Comprehensive, systematic search and selection of primary studies

The basics of a systematic searching strategy have been covered earlier. [1] Standardized search terms are not as well-defined for diagnostic studies as they are for therapeutic studies. Special search strategies have been described for articles which evaluate diagnostic accuracy. [5]

Kapoor [2] described a comprehensive search strategy with additional details available upon request from the author. Dinh’s [3] description of the search strategy used seemed less comprehensive than Kapoor’s [2] description.

Article acquisition

The process by which the articles are selected from the search for review and data abstraction have been discussed earlier in the series. [1]

Kapoor2 clearly described a process by which two authors reviewed each study for inclusion with a third author refereeing ties. Dinh‘s3 explanation of article acquisition after the search was less clear than Kapoor’s. [2]

Data abstraction

This has been covered in an earlier publication regarding the meta-analysis of therapeutic interventions.1 One of the most important pieces of information to extract from a diagnostic study is the ability to generate a 2 x 2 table for each primary study.

Since diagnostic studies may be incompletely reported the authors of the meta-analysis may need to directly contact the authors of the primary study for additional information. [6]

Kapoor‘s [2] study described in detail the process by which the data was abstracted by two reviewers using a standardized form from the Cochran Collaboration. Resolution of disagreements between the independent reviewers was not described. Masked conditions were not used nor were inter rater agreements reported. Dinh’s [3] description of the data abstraction process was less complete and consisted of a description of a standardized form which was used to abstract the data.

Critical appraisal of studies selected for quality

Diagnostic studies are highly variable with regards to quality and completeness in reporting. [6] The quality of the primary diagnostic accuracy studies published is in general; less mature than studies of therapeutic interventions. Evaluating the quality of a diagnostic study differs greatly from studies of therapeutic interventions. While no commonly accepted standardized method exists the second article of the series [4] describes the critical evaluation of a diagnostic study. The results of which are summarized in Table 2.

Table 2 Evaluating internal validity of a diagnostic study.

Unlike studies of therapeutic interventions often times the primary diagnostic studies may have incomplete information making studies difficult to include in the meta-analysis. If additional information cannot be obtained from the author the primary study results may not be included.

Kapoor [2] referenced the Cochran Methods Group checklist on Systematic Review of Screening and Diagnostic tests as an instrument which was used to assess study quality. Dinh [3] described and referenced an instrument to evaluate study quality.

Heterogeneity
Searching for heterogeneity

The choice of the data analytic method to combine study results is a function of the degree of heterogeneity found. This can be accomplished both graphically and statistically similar to a therapeutic study. A forest plot of study outcomes with 95% confidence intervals can be constructed to visually assess for heterogeneity similar to a therapeutic study. [7] A forest plot can be generated for measures of diagnostic accuracy to include: sensitivity, specificity, likelihood ratios, area under the curve derived from an ROC (receiver operator chacteristic) plot or diagnostic odds ratio.

ROC plots [8] are used in studies of diagnostic accuracy to demonstrate the pattern of sensitivities and specificities observed when the performance of a continuous test is evaluated at several different diagnostic thresholds. The overall diagnostic performance of a diagnostic test can be evaluated by the shape of the receiver operating characteristic curve. The curve is constructed using sensitivity plotted against one minus specificity. (Fig.1) The closer the curve passes to the upper left-hand corner of the plot the more accurate the test is. The area under the curve can be quantified as a measure of test accuracy. In order to be of any use the area under the curve has to be > 0.5. The closer the area under the curve approaches 1 the more accurate the test will be.

Figure 1 ROC plot.

The diagnostic odds ratio (DOR) is a summary measure of test performance. [9] Unlike other measures of diagnostic accuracy it can be expressed as a single number rather than a pair of numbers. DOR can be expressed from zero to infinity, the higher the number the better the test performance. DOR is derived by dividing the positive likelihood ratio by the negative likelihood ratio. It is a common measure of accuracy used in meta-analysis of diagnostic studies and is thought to be reasonably constant regardless of the diagnostic threshold. However, it is difficult to apply directly to clinical practice

Statistical tests to quantify heterogeneity in diagnostic studies have low statistical power. The most robust test for quantifying heterogeneity in diagnostic studies is a Q– statistic. [10] As discussed in an earlier paper, Cochran’s Q is the traditional test for heterogeneity. It begins with the null hypothesis that the magnitude of the effect is the same across the entire study population. It generates a probability based upon the Chi squared distribution. The test is underpowered therefore; p > 0.1 indicates lack of heterogeneity.

Kapoor [2] in the comment section of the paper discussed the many flaws of the primary studies found in the meta-analysis. Because of small subset numbers the effect of design flaws in the primary studies could not be explored completely. Neither graphical methods nor statistical methods were used to explore for heterogeneity.

Dinh [3] used Cochran’s Q to evaluate heterogeneity of the primary studies. The results were discussed and presented in table format. Graphical methods were not utilized to search for heterogeneity.

Causes of heterogeneity in diagnostic studies

Differences in study results can be explained by any combination of the following: quality of the study, study location, method of selection of the patients studied, variations in study population, reference standard used in study. As in therapeutic meta-analysis [10] a pre-planned meta-regression can be used to explore causes of heterogeneity found in diagnostic studies. Subgroup analysis is another method to explore heterogeneity. Caution should be used in interpreting the results if they are not planned in advance and should be used to generate rather than confirm hypotheses.

A special consideration when evaluating heterogeneity in diagnostic studies for meta-analysis is to consider the effect of the diagnostic threshold [11] used in the primary studies. Primary studies may use different cutoff points to determine positive and negative results and as a result may introduce heterogeneity in the meta-analysis. By changing the diagnostic threshold to increase sensitivity it results in a decrease in test specificity. This may be explicit as in the case of a continuous measure or implicit in the case of a dichotomous measure.

For example, in the probe to bone test for evaluating osteomyelitis in infected diabetic ulcers the investigators may use the same explicit threshold however, the investigators may differ in what they regard as the boundary between normal and abnormal when performing the test.

An explicit threshold for continuous measures may be derived from a ROC plot. Spearman’s correlation coefficient can be calculated between specificity and sensitivity of all diagnostic studies included in the meta-analysis to determine if the heterogeneity of the study is due to differing diagnostic thresholds. If the results of the test are strongly negatively correlated heterogeneity of the studies is unlikely due to a threshold effect.

Kapoor [2] found that studies which did not use bone histology to exclude disease tended to have better test performance. In addition, studies which were published in 1988 or earlier demonstrated lower test performance. Neither study used meta-regression analysis or threshold analysis. It was unclear if Dinh [3] explored for the causes of heterogeneity of the primary studies.

Pooling of data

Combining data in a diagnostic data analysis is not as refined as therapeutic studies. There is no consensus on the best method to pool data in a meta-analysis for primary diagnostic studies. [10] Random and fixed models as described earlier for therapeutic interventions [11] can be used and have been used in meta-analysis of diagnostic studies. [12] Directly combining sensitivities and specificities, predictive values, likelihood ratios and diagnostic odds ratios have been reported.  If threshold heterogeneity is detected directly combining measures of diagnostic accuracy should be avoided.

Unlike therapeutic interventions each diagnostic study provides a paired estimate of diagnostic accuracy (sensitivity and specificity). An alternate method of combining this type of data is to generate a summary ROC plot. [12,13] If significant threshold heterogeneity is detected a summary ROC plot is a better meta-analytic model. Unlike a traditional ROC plot the summary ROC plot uses the sensitivity and specificity (diagnostic odds ratio) obtained from each primary study as a data point in constructing the curve. A summary curve is obtained by fitting a regression model to the pairs of sensitivity and specificity (diagnostic odds ratio) from each individual study.

Using the summary ROC plot, a cutoff point can be determined and a global summary measure of ROC the Q* statistic can be determined. The Q* statistic derived can be used to compare the accuracy of different diagnostic studies.

Kapoor [2] used a summary ROC analysis as the method by which to pool the data from the primary studies. In addition, 13 different subsets were evaluated using this method to attempt to explain the heterogeneity of the primary studies.

Dinh [3] pooled diagnostic odds ratios, sensitivities and specificities using a random effects model. In addition, a summary ROC analysis was performed and the area under the curve (Q*) was determined as a measure of test performance.

Results

The results of the study should be presented as point estimates with 95% confidence intervals. Disease prevalence should be reported as a measure of central tendency with ranges. Ideally there should be a comparison in diagnostic accuracy between well done and poorly done primary studies. Finally, the authors should discuss the cost the study evaluated and recommended.

Kapoor [2] and Dinh [3] concluded that MRI was superior in evaluating foot osteomyelitis compared to pedal radiographs, Technetium 99 phosphate bone scans and white blood cell scans. Kapoor [2] reported diagnostic odds ratios as point estimates with 95% confidence intervals. Dinh [3] reported odds ratios but only included 95% confidence intervals with reports of sensitivity and specificity. The overall diagnostic odds ratio for MRI in Kapoor’s [2] study was 42.1. Dinh [3] reported overall diagnostic odds ratio for MRI as 24.36. In addition, Dinh [3] reported a summary measure of accuracy (Q*), as well as, pooled sensitivities and specificities which demonstrated the superiority of MRI in the diagnostic assessment of osteomyelitis in diabetic pedal ulcerations. The average prevalence of osteomyelitis in Kapoor’s [2] study was 50% (32%-89%).

While the prevalence of osteomyelitis reported in Dinh’s [3] study ranged from 12% – 86%. Kapoor’s [2] study most likely was composed of hospitalized patients rather than outpatients. [4] Kapoor [2] found that studies prior to 1998 reported lower diagnostic performance likely due to poorer study designs. In addition, Kapoor [2] found that studies which did not use bone histology as a reference standard had higher diagnostic test performance. Both authors discussed the costs of MRI relative to the other imaging studies.

References

1. Turlik M: Evaluation of a review article. The Foot and Ankle Online Journal. 2: 2009.
2. Kapoor A, Page S, LaValley M, Gale D: Magnetic resonance imaging for diagnosing foot osteomyelitis. Archives Internal Medicine 167: 125 – 132, 2007.
3. Dinh M, Abad C, Safdar N: Diagnostic accuracy of the physical examination and imaging tests for osteomyelitis underlining diabetic foot ulcers: Meta-analysis. Clinical Infectious Diseases 47: 519 – 527, 2008.
4. Turlik M: How to evaluate an article on diagnosis for validity. The Foot and Ankle Online Journal 2: 2009.
5. Haynes R, Wilczynski N: Optimal search strategies for retrieving scientifically strong studies of diagnosis from MEDLINE: analytic survey. BMJ 328: 1040 – 1045, 2004.
6. Halligan S: Systematic reviews and meta-analysis of diagnostic tests. Clinical Radiology 60: 977 – 979, 2005.
7. Leeflang M, Deeks J, Gatsonis C, Bossuyt P: Systematic reviews of diagnostic test accuracy. Ann Intern Med.1 49: 889 – 897, 2008.
8. ROC curve http://www.anaesthetist.com/mnm/stats/roc/Findex.htm Accessed 05/08/2009.
9. Glas A, Lijmer J, Prins M, Bonsel G, Bossuyt P: The diagnostic odds ratio: a single indicator of test performance. J Clin Epid 56: 1129 – 1135, 2003.
10. Honest H, Khan K: Reporting of measures of accuracy in systematic reviews of diagnostic literature. BMC Health Services Research 2: 4, 2004.
11. Turlik M. Evaluating the results of a systematic review/meta-analysis. The Foot and Ankle Online Journal. 2: 2009.
12. Devillé W, Buntinx F, Bouter L, Montori V, de Vet H, van der Windt D, Bezemer D: Conducting systematic reviews of diagnostic studies: didactic guidelines. BMC Medical Research Methodology 2: 9, 2002.
13. Gatsonis C, Paliwal P: Meta-analysis of diagnostic and screening test accuracy evaluations: Methodologic primer. AJR 187: 271 – 281, 2006.


Address correspondence to: Michael Turlik, DPM
Email: mat@evidencebasedpodiatricmedicine.com

Private practice, Macedonia, Ohio.

© The Foot and Ankle Online Journal, 2009

How to Evaluate an Article on Diagnosis for Validity

by Michael Turlik, DPM1

The Foot and Ankle Online Journal 2 (11): 6

This is the second of four articles discussing the evaluation of diagnostic studies for podiatric physicians. This article deals with the evaluation of internal and external validity of a diagnostic study. Diagnostic articles from the foot/ankle literature will be used to illustrate the concepts of critical analysis.

Key words: Evidence-based medicine.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License. It permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. ©The Foot and Ankle Online Journal (www.faoj.org)

Accepted: August, 2009
Published: September, 2009

ISSN 1941-6806
doi: 10.3827/faoj.2009.0211.0006


The first article of this series [1] discussed two different methods of how clinicians solve diagnostic problems. When clinicians use probabilistic diagnostic reasoning to solve clinical problems they often require diagnostic tests to help refine and revise the diagnostic hypotheses they generate. The use of likelihood ratios (LR) for diagnostic test results was discussed and it was advanced that LRs were based upon the results of published diagnostic studies. The strength of the inference from these studies depended upon the validity of the methods used to obtain the information about the diagnostic test. Different study designs may produce different results for diagnostic accuracy. The purpose of this article is to provide instruction regarding the critical analysis of diagnostic studies. Whether one can believe the results of a study is determined by the methods used to carry it out. This is the second of four articles explaining how to evaluate diagnostic foot/ankle studies.

Internal validity

The Consort statement is a set of published guidelines for the reporting of randomized controlled trials in the medical literature. [2] The diagnostic counterpart to this document is the STARD Statement*. [3] Most of the major medical journals with high impact factors have adopted these guidelines as requirements for publication. In the first article of this series (1) it was stated that the best study design for a diagnostic test was a cross sectional study of an independent, masked comparison with a reference standard among an appropriate population of consecutive patients. If properly planned and reported this type of study can be considered level I evidence for studies of diagnostic accuracy. The reader is referred to the following reference for a more complete explanation of the various study designs and levels of evidence for diagnostic studies.4 This paper will confine itself to critically evaluating studies which seek to confirm a diagnosis rather than using the results for a screening test. Articles from the foot/ankle literature will be compared and contrasted to illustrate the principles discussed in this article.

*STAndards for the Reporting of Diagnostic Accuracy studies.

Was there an appropriate consecutive population of patients enrolled in the study?

It is unacceptable to only enroll patients with severe symptoms and signs of the target disease to be compared to disease free people who are healthy asymptomatic volunteers. Studies [5,6] which use obvious diseased individuals compared with normal individuals do not provide any useful diagnostic information. Large overestimation of diagnostic accuracy has been shown when studies enroll only patients with advanced symptoms compared to normal controls. [7,8] The sensitivity and specificity of a test depends on the characteristics of the population studied. There needs to be a spectrum of patients similar to the patients we would use the test on in normal clinical practice. The spectrum should contain patients exhibiting mild signs and symptoms who are early in the target disease to late severe patients with the target disease as well as, conditions commonly confused with the target disease. Failure to enroll an appropriate population of patients results in spectrum bias with an overestimation of diagnostic effect. Readers of diagnostic studies should look for a description of the study population to include: definition of the target population, study location, age distribution, sex distribution, summary of presenting clinical symptoms and/or disease stage when evaluating diagnostic studies for spectrum bias.

Ideally patients will be prospectively enrolled in the study consecutively based upon suspicion of clinically having the target disease. It is important for the reader to review the method by which the patients entered the study. Alternate study designs may produce spectrum bias affecting the results of the study. One study showed [8] that accuracy of the index test was lower in studies that selected patients on the basis of whether they had been referred for the index test rather than on clinical symptoms. Failure to enroll patients consecutively (selection bias) and retrospective studies [9] are associated with an overestimation of diagnostic accuracy. [8]

In the first article of this series [1] it was proposed that post-test probabilities and predictive values vary with target disease prevalence. In contrast, sensitivities, specificities and likelihood ratios do not. [10] LRs are affected by disease spectrum. As long as the disease spectrum is the same prevalence will not affect LR, sensitivity or specificity. Studies with a different spectrum of disease will result in different LRs. In general, infected diabetic ulcers are less common and less severe with more subtle changes when seen in a podiatric physician’s office setting than in patients with diabetic infected ulcers referred to a tertiary care center. Evaluating the spectrum of patients making up the study is a key and critical decision in judging the validity of diagnostic studies.

The two studies which we will use in this article to illustrate the critical analysis of diagnostic tests both evaluate the usefulness of a probe to bone test for osteomyelitis in diabetic infected ulcerations of the feet. [11,12] Both studies evaluated infected diabetic foot ulcers for osteomyelitis using a metal probe of the wound to detect bone. A positive probe to bone test is thought to aid in the diagnosis of osteomyelitis.

Grayson, et. al., [11] evaluated prospectively infected diabetic foot ulcers in a tertiary care hospital setting. They describe their patient population as having severe limb-threatening infections. Demographic information was included not in a table format but, in the narrative of the results section. It wasn’t clear if the patients were consecutively enrolled. Lavery, et. al., [12] in a later study prospectively evaluated infected diabetic foot ulcers in an outpatient setting evaluating patients from two large primary care practices. The authors provided a table in the results section clearly describing the demographic features of the patients enrolled in the study. Patients were enrolled in the study based upon clinical findings during an office visit. It wasn’t clear if the patients were consecutively enrolled. It should be clear that the spectrum of patients in the two studies is different. Neither study only enrolled patients with severe symptoms and signs of the target disease to be compared to disease free people who are healthy asymptomatic volunteers.

Did the study investigators compare the index test against an appropriate reference standard?

Ideally a consecutive series of patients with clinical findings of the target disease are subject to the index test and then verified by the reference standard. The term reference standard is sometimes referred to as the gold standard. Ideally a reference/gold standard should be able to distinguish between all normal people and people with the target disease. This test would be 100% sensitive and 100% specific with no false-positives or false-negative results. It is unlikely that any test will perform this well. Some common examples of appropriate reference standards are: biopsy, autopsy or long-term follow-up without treatment. While the choice of a reference standard may not be perfect it is essential for the reader of the study to have confidence in the reference standard used to identify the target disorder. It is important that the results of the index test not be part of the decision to perform the reference standard. In addition, there should be a close temporal relationship between the performance of the index test and the reference standard. Failure to perform the tests in an appropriate time sequence may result in an intervention in the care of the patient which would alter results of the reference standard and therefore bias the results.

Grayson, et. al., [11] chose bone biopsy with histologic determination as the reference standard. Not all patients had a bone biopsy in some cases the reference standard included radiographic changes, clinical identification the bone by a surgeon during surgical debridement and after a short course of antibiotics if the ulceration resolved and did not recur. The authors report that the results of the index test influenced the decision to biopsy the bone. The intervals between the evaluation of the ulcer and biopsy of the bone for evaluation averaged 13.9 days (5-42). Lavery, et al., [12] also used a bone biopsy as a reference standard however; the authors relied on culture results rather than histologic examination. The authors clearly described that the decision to biopsy the bone was not influenced by the results of the index test.

It wasn’t clearly stated what the time difference was between the performance of the index test and the reference standard. In addition, patients that did not receive a bone biopsy were followed for resolution of the ulcer without complications.

Was the selected reference standard used for all of the patients receiving the index test?

It is essential that all of the patients who received the index test also receive the same reference standard. It has been shown [7,8] that using different reference standards for positive and negative index tests results, produces an overestimation of diagnostic tests effectiveness. The terms verification bias or workup bias are used to describe the process of using different referenced standards.

Grayson, et. al., [11] enrolled 75 patients with 76 ulcers. Bone was biopsied in 53 patients 46 of which demonstrated histological changes associated with osteomyelitis. Four patients were not biopsied but osteomyelitis was diagnosed clinically by means of radiographs, and/or evaluation the bone clinically by the surgeon during debridement. Nineteen cases were excluded from a diagnosis of osteomyelitis because of clinical follow-up. Lavery, et. al., [12] biopsied 30 patients of the 247 patients enrolled in the study. It appears the majority of patients were followed forward in time until the wound healed or they required surgical therapy.

Were the investigators evaluating the index test and the reference standard blinded to the results?

Knowledge of the index test can bias the interpretation of the reference standard. This is termed review bias. The index test and reference standard should be independently evaluated. Although intuitively this makes sense and one would expect that the diagnostic results would be affected by lack of blinding, this has not been shown to alter the diagnostic results in two large studies. [7,8] The authors of these studies suggest that if the reference standard was subjective this effect would be greater.

Grayson, et. al., [11] clearly stated in their article that the pathologist evaluating the bone biopsy was not aware of the results of the probe to bone test. It is not clear if the physicians interpreting the radiographs, debriding the wounds and following the patients were blinded to results from the probe to bone test. Lavery, et al., [12] did not state during the methods section if anyone was blinded in the study. The podiatrists who performed the probe to bone test followed the patients who were not biopsied to evaluate their clinical status.

Was there a sample size calculation?

While it is common for randomized controlled trials to report calculations for sample size similar explanations are usually not found when reviewing diagnostic studies. [13]

Failure to calculate sample size estimates prior to beginning the study results in studies which may be too small to provide precise estimates of sensitivity and specificity. A method of how to perform sample size calculations for diagnostic studies has been advanced by Carly, et. al. [14]

Neither probe to bone study [11,12] described a sample size calculation for sensitivity and specificity. Although, neither article reported results in terms of likelihood ratios, using the information in the article point estimates with 95% confidence intervals can be calculated for LRs15 ( Table 1). LRs can be categorized from having a minimal to a large effect on pretest probabilities. 16 (Table 2)

Table 1 Likelihood ratios with 95% confidence intervals.

Table 2 Interpretation of likelihood ratios.

The results for a +LR calculated from Grayson, et. al.,’s study [11]  reveals a point estimate for a +LR to be 4.29 indicating a small effect.

However, using the results of the 95% confidence interval the true effect lies between 1.7 and 11 or from a minimal effect to a large conclusive effect. The range of the 95% confidence interval reveals that there is a lack of precision with this measurement likely due to the number of patients enrolled in the study. Similarly in Lavery, et. al.,’s study12 the point estimate for the -LR is 0.15 indicating a moderate effect. However, the results of the 95% confidence interval indicate the true value could lie anywhere between 0.06 a large conclusive effect to 0.37 consistent with a small effect. Again, this indicates that too few patients were enrolled in the study to allow for accurate estimate of the diagnostic effect.

External validity

Did the authors provide information regarding evaluating and limiting variability of the index test and reference standard?

The authors of the study should clearly describe the index test used in the study to allow for replication by the reader. Failure to do this has been shown to introduce bias into the study. [7] Intraobserver and interobserver variability occurs when interpreting diagnostic tests. Experts may not agree upon the interpretation of radiographs and instruments may need to be calibrated. Differences in test interpretation may bias the results of the study in a systematic manner. The authors should describe the efforts which were made to standardize the evaluation of the index test used in the study. This may include the use of statistical measures to evaluate agreement among experts.

After reviewing the description of index test in the article the reader will need to decide whether or not the test will be reproducible in his or her practice setting. In order to implement the index test it may be necessary for additional costs, training regarding the performance and interpretation of the test.

The authors of the probe to bone studies [11,12] described in sufficient detail the index test used in the study.

Neither provided any information regarding statistical measures to evaluate agreement among experts. It does not seem that any special training, cost or education would be necessary to implement the index test in an average podiatric physician’s office.

Are the patients in the study similar enough to my patients?

The LR of an index test should vary with the spectrum of the patients enrolled in the study. [17] Podiatric physicians will need to consider if their practice setting is similar to the article and whether their patient under consideration would have been included in the study.

Clearly the spectrum of disease is different in the two studies evaluating the probe to bone test for osteomyelitis. [11,12] Grayson’s study [11] was performed in severe limb-threatening infections at a tertiary care hospital. In contrast, Lavery, et. al.,’s study [12] was performed on a less severe population of diabetic patients in an out-patient setting. Since the majority of podiatric physicians will encounter infected diabetic ulcers in an outpatient setting rather than a tertiary care institution the results of Lavery, et. al.,’s study are more relevant for the majority of podiatric physicians.

Will the results of the study change my diagnostic approach?

In order to use likelihood ratios derived from a diagnostic study the podiatric physician should have some idea of the pretest probability of the target disorder in their patient population. This can be accomplished by reviewing the published literature on the subject, using the pretest probabilities in the diagnostic study, personal experience and clinical judgment. If the podiatric physician cannot determine pretest probability of the target disease it is unlikely the results of the study will provide any meaningful information.

Podiatric physicians are likely to have some process by which they arrive at a diagnosis for common conditions. This may include elements of the history and physical examination in conjunction with diagnostic studies. How should the index test be incorporated into the podiatric physician’s diagnostic process? Should the index test serve as a replacement for an existing test or an addition to the normal diagnostic process? Will the results of the diagnostic test change the test/treatment threshold? If the answer to this question is no then the podiatric physician needs to recognize that additional testing will be necessary in order to pass the test/treatment threshold. (Fig 1)

Figure 1 Probability of diagnosis.

Grayson’s study [11] demonstrated a prevalence of 66% which was associated with a + LR of 4.4 and a -LR of 0.4. (Table1) The posttest probability with a +LR was 90% and the posttest probability with a -LR was 44%. (Table 3) This should be interpreted that a positive test is above the treatment threshold while a negative result lies within an intermediate range indicating further testing is necessary. (Fig 1) To evaluate the precision of these results we need to consider the 95% confidence intervals about the LR point estimate. The values of the 95% confidence interval for a -LR remained within the intermediate range indicating further diagnostic studies are necessary however, the lower boundary of a +LR is only 77%.

Table 3 Posttest Probabilities with 95% confidence intervals for positive and negative likelihood ratios.14

The reader will need to decide if a worst-case scenario for a positive test (77% posttest probability) is above the treatment threshold.

Lavery, et. al.,’s study [12] demonstrated a prevalence of 12% which was associated with a +LR of 9.4 and a -LR of 0.15. (Table 1) The posttest probability with a -LR was 2% while the posttest probability of a +LR was 56%. (Table 3) A positive result falls within in the intermediate range of posttest probabilities indicating that further diagnostic studies are necessary and a negative result falls below the test threshold. (Fig 1) When the values of the 95% confidence intervals are considered about the posttest probabilities it is clear that a positive test still remains within the intermediate range indicating further diagnostic studies are necessary and the 95% confidence intervals for negative test still remain below the test threshold indicating further diagnostic studies are unnecessary. The conclusion that the reader should reach is that a negative probe to bone test in severe diabetic infected ulcers at a tertiary care center and a positive probe to bone test in a less severe diabetic infected ulcer in outpatient primary care setting require additional testing and a negative probe to bone test in an infected diabetic ulcer in a primary care outpatient setting does not require additional testing. A positive probe to bone test in a severely infected diabetic foot ulcer at a tertiary care center may not be definitive for osteomyelitis.

Multiple diagnostic tests may be utilized in two different ways. [18] They may be used in a parallel fashion or a serial fashion. Parallel testing is usually performed in an emergency room or hospital setting when rapid assessment of disease processes are necessary. Typically there will be an increase in sensitivity but a decrease in specificity resulting in higher false positives. This is most useful when the podiatric physician is faced with multiple diagnostic tests of low sensitivity. Serial testing is the process whereby diagnostic tests are ordered one at a time and are dependent upon the results of the previous test. Typically this process is used in outpatient settings where there is not an urgent need to make a diagnosis. This process is useful when the diagnostic studies are expensive or risky.

The results are that the overall specificity increases but the overall sensitivity of the tests decrease. This is useful when the podiatric physician is faced with multiple tests with low specificity. With each additional bit of information obtained from the clinical examination or the results of diagnostic studies the probability of the target disease changes. The post test odds of the first test become the pretest odds of the second test. The problem with using sequential tests in the diagnostic workup of the target disorder is that if the diagnostic tests are related the additional information obtained from the second test may not provide any further information about the target disorder. For example, multiple imaging tests are often used in the evaluation of osteomyelitis in an infected diabetic ulcer: radiographs, bone scanning and magnetic resonance imaging (MRI). It is likely that these tests are not independent of each other and the diagnostic information obtained may overlap. Additional information is only gained when the diagnostic tests do not measure the same thing and are independent of each other. For example, probe to bone test and MRI do not appear to be dependent. Multiple regression analysis is used to evaluate different combination of tests to learn about the predictive value of sequential testing. Clinical prediction rules are used to determine the combination of diagnostic studies and their relevance to the target disorder. The most cited clinical prediction rule with regards to the foot and ankle literature is the Ottawa Ankle Rules. [19]

Summary

When evaluating a paper on diagnosis is important for the reader to determine if the authors used methods which would minimize bias and that the results are generalizable to their practice setting.

The spectrum of disease used in Grayson, et. al.,’s study is not as relevant to practicing podiatric physicians as Lavery, et. al.,’s disease spectrum. Although both were prospective studies, neither study clearly described consecutive enrollment of patients. Both studies used a bone biopsy as a reference test however; it was not applied to all patients in either study. Lavery, et. al., clearly indicated that the decision to perform the reference standard was not based upon the results of the index test however; Grayson, et. al., indicated that the results of the index test were used in the determination to perform the reference standard. Grayson provided information with regards to the time between the performance of the index test and reference standard. Lavery did not provide any information regarding the time difference between the index test and reference standard. Grayson, et. al., clearly stated that the pathologist reviewing the bone biopsy was blinded to the clinical information obtained. It was not clear if blinding occurred in any other aspect of either study. Neither study described a sample size calculation.

Both studies provided information describing the index test to allow for replication in practice. The test should be able to be applied in any podiatric physicians practice without any excessive amount of training or cost. Neither study reported statistical analysis of agreement between investigators regarding the test. The results of Grayson, et. al.,’s study are most generalizable to a tertiary care hospital setting. It should be viewed that a negative test in this population requires further diagnostic testing to rule out osteomyelitis. A positive test may confirm osteomyelitis however, the results are not definitive.

Lavery, et. al.,’s study is most generalizable to a podiatric physician’s outpatient practice. Lavery, et. al.,’s study demonstrates that a positive test does not confirm osteomyelitis and that additional testing is necessary. The results of a negative probe to bone test in Lavery, et. al.,’s study is definitive for ruling out osteomyelitis in a diabetic ulcer.

References

1. Turlik M: Introduction to diagnostic reasoning. Foot and Ankle Online Journal, 2009.
2 . Consort statement. http://www.consort-statement.org/ Accessed 3/15/09.
3. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Moher D, Rennie D, de Vet HC, Lijmer JG: The STARD Statement for Reporting Studies of Diagnostic Accuracy: Explanation and elaboration. Ann Intern Med. 138 (1): W1 – 12, 2003.
4. Diagnostic levels of evidence http://www.cebm.net/index.aspx?o=1025 Accessed 3/15/09.
5. Gregg J, Silberstein M, Schneider T, Marks P: Sonographic and MRI evaluation of the plantar plate: A prospective study. Eur Radiol. 16 (12): 2661 – 2669, 2006.
6. Sabir N, Demirlenk S, Yagci B, Karabulut N, Cubukcu S: Clinical utility of sonography in diagnosing plantar fasciitis. J Ultrasound Medicine 24 (8): 1041 – 1048, 2005.
7. Lijmer J, Mol BW, Heisterkamp S, Bonsel GJ, Prins MH, van der Meulen JH, Bossuyt PM: Empirical evidence of design-related bias in studies of diagnostic tests. JAMA 282 (11): 1061 – 1066, 1999.
8. Rutjes AW, Reitsma J, Di NM, Smidt N, van Rijn JC, Bossuyt PM: Evidence of bias and variation in diagnostic accuracy studies. CMAJ 174: 469 – 476, 2006.
9. Brown R, Rosenberg ZS, Thornhill BA: The C sign: more specific for flatfoot deformity than subtalar coalition. Skeletal radiology 30: 84 – 87, 2001.
10. Deeks J, Altman D: Diagnostic tests 4: likelihood ratios. BMJ 329: 168 – 169, 2004.
11. Grayson M, Gibbons G, Balogh KE, Levin E, Karchmer A: Probing to bone in infected pedal ulcers. A clinical sign of underlying osteomyelitis in diabetic patients JAMA 273: 721 – 723, 1995.
12. Lavery L, Armstrong D, Peters E, Lipsky B: Probe-to-bone test for diagnosing diabetic foot osteomyelitis. Diabetes Care 30: 270 – 274, 2007.
13. Rutten F, Moons K, Hoes A: Commentary: Improving the quality and clinical relevance of diagnostic studies BMJ 332: 1129, 2006.
14. Carley S, Dosman S, Jones S, Harrison M: Simple nomograms to calculate sample size in diagnostic studies. Emerg Med J 22: 180 – 181, 2005.
15. Likelihood Ratio Calculator http://araw.mede.uic.edu/cgi-alansz/testcalc.pl Accessed 3/8/09.
16. Interpretation of likelihood ratios. http://www.poems.msu.edu/InfoMastery/Diagnosis/likelihood_ratios.htm Accessed 3/21/2009.
17. Furukawa T, Strauss S, Bucher HC, Guyatt G: Diagnostic Tests. In: Guyatt G, Drummond R, Meade MO (eds). Users guides’ to the medical literature.New York: McGraw-Hill, 419 – 438, 2008.
18. Fletcher R, Fletcher S: Clinical Epidemiology the Essentials. 4th ed. Philadelphia Pennsylvania: Lippincott Williams and Wilkins, 2005.
19. Stiell IG, Greenberg GH, McKnight RD, Nair RC, McDowell I, Worthington JR. A study to develop clinical decision rules for the use of radiography in acute ankle injuries. Ann Emerg Med. 4: 384 – 390, 1992.


Address correspondence to: Michael Turlik, DPM
Email: mat@evidencebasedpodiatricmedicine.com

Private practice, Macedonia, Ohio.

© The Foot and Ankle Online Journal, 2009

Introduction to Diagnostic Reasoning

by Michael Turlik, DPM1

The Foot and Ankle Online Journal 2 (10): 5

This is the first of four articles written for podiatric physicians to help them understand and apply the results of diagnostic studies to their practice. This article deals with how clinicians arrive at a diagnosis and how to interpret results from a diagnostic trial. An article from the foot and ankle literature will be used to illustrate the concepts discussed in the publication.

Key words: Evidence-based medicine, industry sponsored trials.

Accepted: August, 2009
Published: October, 2009

ISSN 1941-6806
doi: 10.3827/faoj.2009.0210.0005


The method of arriving at a diagnosis can be a simple or a very complex process. This depends upon the clinician’s knowledge, experience, clinical presentation of the diagnostic problem, prevalence of the disease and diagnostic studies employed. Podiatric physicians always encounter some degree of uncertainty in practice whether it is the true effect of a therapeutic intervention or the diagnosis of a patient’s condition. After collecting information needed to make a diagnosis there usually is some information threshold after which additional information becomes irrelevant and treatment begins. (Fig.1) There are two basic ways by which clinicians arrive at a diagnosis. They are, pattern recognition / categorization or a probabilistic diagnostic reasoning / hypotheico-deductive approach. [1,2]

Figure 1 Threshold model of decision making.*  

*Reproduced with permission from Center for Evidence-Based Medicine, http://www.cebm.net/index.aspx?o=1043

Pattern recognition approach or Categorization

This approach is used by experts making a common diagnosis in their field of expertise. The use of this method varies widely among clinicians and is based upon the thoroughness of their knowledge base and their experience. When using this approach podiatric physicians are able to quickly evaluate the clinical scenario and place it in some familiar combination of signs and symptoms and rapidly make the diagnosis. This type of diagnostic reasoning does not involve the generation of multiple hypotheses which are tested and it is unlikely they use the same reasoning process as novice clinicians. For example, Mr. Jones a 52 year-old obese white male who presents to the office complaining of a two-week history of heel pain which began insidiously and is localized to the plantar medial aspect of his heel.

He relates the pain is worse after periods of inactivity, arising from a sitting position or when first bearing weight on the heel after sleeping. Physical examination reveals no redness, no edema, no deformity, but tenderness to palpation of the plantar medial heel. Even the inexperienced podiatrist would be able to make the diagnosis of mechanically induced heel pain given this scenario. This diagnosis does not require any type of diagnostic studies for most podiatric physicians and the diagnosis is thought to be clinical in nature. [3] The pretest probability of mechanically induced heel pain in this scenario is very high and likely to be greater than 90%. In addition, the podiatrist will recognize that the chance of a bone tumor of the calcaneus producing this clinical picture is close to 0%. The percentage of patients with the disease in the specified population at any point in time is defined as prevelance / pretest probability. Pretest probabilities which are extremely low or extremely high usually will not benefit from further diagnostic testing. (Fig. 1)

Probabilistic Diagnostic Reasoning or Hypotheico-Deductive

When clinicians face an atypical presentation of a common condition or something more challenging for their specialty, clinicians will switch from pattern recognition to probabilistic diagnostic reasoning.

As a result of the clinical encounter the clinician will generate a short list of diagnostic hypotheses with an estimate of the probability for each possibility. This list will guide subsequent efforts in data collection. The pretest probability for this type of diagnostic inquiry usually lies in an intermediate range rather than at the extremes. (Fig. 1) Therefore, diagnostic studies may be very helpful in distinguishing between the different hypotheses and result in restructuring and reprioritizing diagnostic possibilities as further information is obtained. For example, a 50-year-old neuropathic diabetic male presents with a one week history of progressive redness, swelling and pain about a recurrent plantar ulcer successfully treated with oral antibiotics and local wound care in the past. Physical examination reveals a mildly obese afebrile male with palpable pulses and lack of protective sensation bilaterally. The ulcer under the first metatarsophalangeal joint (MPJ) measures 1.5 cm in diameter and exhibits a red base. There is minimal drainage on the dressing without odor. The important question which needs to be answered in this scenario is; does this patient have osteomyelitis? Pretest probability (prevalence) in this case varies from 20-66% based upon the study location referenced. [4] Higher pre-test probabilities are seen in tertiary care hospital settings, lower in outpatient primary care settings. The range of pretest probabilities in this case differs from the earlier example of mechanically induced heel pain because it is in the intermediate category indicating that some further diagnostic test(s) is (are) necessary. (Fig. 1)

The test and treatment thresholds (Fig. 1) are not static but dynamic. They vary with the invasiveness and cost of the test, the consequences of misdiagnoses of the disease process, and the efficacy and expense of the treatment. (Table 1) For example, in the case of a diabetic patient with a pedal ulcer referenced above, the test threshold would be lower for using a metal probe to evaluate the ulcer for osteomyelitis than for performing a bone biopsy. Since mechanically induced heel pain is a benign self limited condition which responds to non surgical care the treatment threshold would be lower than the treatment threshold for osteomyelitis in a diabetic patient. A comprehensive explanation of how to calculate test treatment thresholds using decision tree analysis is provided for the interested reader. [5]

Table 1 Variations in test / treatment threshold.2

The information gained from the results of a diagnostic study change the pretest probability, the revised estimate of prevalence is termed the posttest probability. The magnitude of the change is a function of the strength of the diagnostic intervention on the pretest probability. The direction of the posttest probability can either be higher or lower than the pretest probability depending upon the results of the diagnostic study used. The strength of the diagnostic intervention may be presented in many ways; the most clinically useful is a likelihood ratio.

Assessing the Performance of a Diagnostic Test

The question that podiatric physicians must answer after ordering a diagnostic test is; based upon the results of this test how probable is it that my patient has a diagnosis of ___? To answer this question it is necessary to construct a 2 x 2 table (Table 2) from a study of the intervention to determine the strength of the test. Some measures of probability derived from a 2 x 2 table are the following:

Sensitivity: the proportion of the patients with the disease who test positive.
TP / (TP + FN)

Specificity: the proportion of the patients without the disease who test negative.
TN / (TN+FP)

Positive Predictive Value: proportion of patients with a positive test who have the disease
TP / (TP+FP)

Negative Predictive Value: proportion of patients with negative test who do not have the disease
TN / (TN+FN)

Positive Likelihood Ratio: how much the odds of the disease increase when a test is positive.
sensitivity / 1-specificity

Negative Likelihood Ratio: how much the odds of the disease decrease when a test is negative.
1-sensitivity / specificity

Table 2   2 x 2 diagnostic table.

The higher the sensitivity of the test is the better its ability to detect disease due to a low false negative rate. Diagnostic tests with a high sensitivity 95-99% are used when there is an important price for missing a serious disease which is treatable. High sensitivity tests are usually used early in the workup of the disease and if positive are followed up with a test which has a high specificity. If a test with a high sensitivity is negative the podiatric physician can be comfortable in ruling out the disease process. The mnemonic SnOut refers to a diagnostic test with a high sensitivity.

Diagnostic tests which have a high specificity are used to identify those patients who do not have the condition of interest. A highly specific test will rarely miss people as having the disease when they do not. These types of tests are most useful to confirm a diagnosis which has been suggested by a test which is highly sensitive. Highly specific tests are particularly useful when false positive results can cause harm to the patient physically, psychologically, or fiscally. If the test results are positive it is very helpful to the podiatric physician in confirming the disease process. The mnemonic Spin refers to a diagnostic test with a high specificity.

It is not possible to have a test which is highly specific and sensitive when dealing with data collected over a range of values.

When the test is measured over a continuum of values changing the artificial cutoff point causes a change in the sensitivity and specificity. Sensitivity can only be increased at the expense of specificity. Sensitivity and specificity are not clinically useful measures and do not answer the question of probability of having or not having the disease under evaluation. [6]

Predictive values are another measurement of test efficiency which can be derived from a 2 x 2 table. [7] Predictive values derived can be used to gain information regarding the probability of disease in patients. As a test’s sensitivity increases so too does the negative predictive value. As a test’s specificity increases likewise positive predictive values increase. Unlike sensitivity and specificity predictive values are influenced by disease prevalence. Predictive values vary with disease prevalence in a nonlinear manner. [8] Therefore using predictive values derived in outpatient primary care setting will be misleading when applied to a tertiary care setting since the prevalence is usually different. This is a major limitation to the podiatric physician when using predictive values in clinical practice. In order to be clinically useful they should be employed in as similar a practice setting they were derived from.

A third method of determining test efficiency from a 2 x 2 table is to generate likelihood ratios. [6] Likelihood ratios are not apt to be influenced by disease prevalence provided disease spectrum remains the same for a different prevalence. [9] Likelihood ratios are expressed as odds rather than proportions. Sensitivity, specificity, as well as predictive values are expressed as proportions. Likelihood ratios are the preferred method of expressing test efficiency in evidence-based medicine publications. Likelihood ratios combine sensitivity and specificity of a diagnostic study which allows you to intuitively determine the odds of which the pretest probability will change based upon a positive or negative test result. Pretest probability X likelihood ratio = posttest probability. Using likelihood ratios to modify pretest probabilities to determine the posttest probability cannot be done directly.

Since likelihood ratios are expressed as odds rather than proportions they must be converted prior to application. This can be done using mathematical conversions, internet calculators, or a nomogram.

How best to estimate the prevalence of disease? Clinical observations and experience are often inaccurate. A better estimate arises from reviewing the medical literature on the subject and/or evaluation of large computerized databases. Pretest probability is not a constant but varies with the clinical environment. Prevalence is increased when patients are passed through a filter from a primary care source to a tertiary care facility.

In order for a podiatric physician to correctly utilize a diagnostic study he or she will need to estimate the prevalence of the disease in their patient population, the likelihood ratio of the test employed and the rigor of the study used to determine the test’s accuracy. In a recent systematic review of electrodignostic techniques currently in use to evaluate tarsal tunnel syndrome (TTS) [10] the authors conclude that due to the poor quality of the studies sensitivities and specificities reported could not be combined in a summary statistic. In addition, prevalence for TTS could not be determined. The author’s conclusions limit the usefulness of electrodiagnostic studies in the evaluation of TTS.

Diabetes and Pedal Osteomyelitis

A recent article [11] appraises the published literature concerning the various diagnostic options for evaluating infected diabetic foot ulcers for the presence of osteomyelitis. The gold standard in each study was bone biopsy. A summary of the authors findings limited to higher quality studies is presented in Table 3. The highest likelihood ratios are found for ulcer area > 2 cm2 and erythrocyte sedimation rate (ESR) > 700 mm/hr. Unfortunately, these test also have very large 95% confidence intervals which indicates that the results of these studies are not very precise. Tests which have a narrower 95% confidence interval are magnetic resonance imaging (MRI), probe to bone and abnormal radiograph. The probe to bone test based upon its cost, and adverse effects should be the first test under taken by the podiatric physician when evaluating an infected diabetic pedal ulcer for the presence of osteomyelitis.

Table 3 Likelihood Ratios for Studies used to Evaluate Diabetic Osteomyelitis.11  (*Confidence Intervals)

The likelihood ratio for the probe to bone test cited in Butalia’s review is a composite of three different studies. One of which is Lavery’s study. Lavery and colleagues evaluated the accuracy of a probe to bone test for osteomyelitis in patients with diabetic foot ulcers. [4] They expressed their results in terms of sensitivity, specificity, positive and negative predictive values. They did not report likelihood ratios.

In the results section the authors report information which could be used to construct a 2 x 2 table (Table 4). Using an online diagnostic calculator [12] likelihood ratios can be calculated. The value for a positive test was 9.4 and a negative test 0.14. When likelihood ratios are greater than one this increases the chance of the disease being present, likelihood ratios less than one decrease the chance of the disease being present. Likelihood ratios of > 10 or < 0.1 generate large conclusive changes. Likelihood ratios between 5-10 and 0.1-0.2 are associated with moderate changes in probability. The likelihood ratios calculated from Lavery’s study [4] are associated with moderate/large changes in diagnostic probabilities. The pretest probability (prevalence) from Lavery’s study [4] is 12%.

Table 4 Results of probe to bone test.
*modified from Diabetes Care 30: 270, 2007

Using an on line calculator [13] or a Likelihood Ratio Nomogram (Fig. 2) the posttest probability can be calculated for a positive test to be 56.4% and a negative test 1.87%. A negative test should fall below the test threshold therefore effectively ruling out the condition. (Fig. 1) A positive test in this scenario still remains in the intermediate range for this prevalence and indicates further testing is necessary. If the prevalence were higher for example, 60% which is the prevalence seen in some studies in tertiary care centers [14] the posttest probability for a negative result would be 17.4% and a positive result would be 93.4%.

Figure 2 Likelihood Ratio Nomogram.*
*reproduced with permission from Center for Evidence-Based Medicine
http://www.cebm.net/index.aspx?o=1043

These results indicate that further testing for a positive test is likely unnecessary while a negative test may fall within the intermediate range and may require further testing, opposite the results using the prevalence from Lavery’s study.

The above example demonstrates the use of likelihood ratios for diagnostic studies evaluating a dichotomous outcome. Likelihood ratios can also be used with continuous test results as interval likelihood ratios. [15]

How believable is the likelihood ratio derived from a study?

The quality of the evidence derived from a diagnostic study is a function of the studies ability to minimize bias. [16] The best study design for diagnostic tests (Level 1) is an independent, masked comparison with a reference standard among an appropriate population of consecutive patients. Just as with randomized controlled trials, diagnostic studies are separated into different levels of evidence (Table 5), with the less rigorous (more biased) studies over estimating test effectiveness. [17] The largest effect of overestimation occurs from studies which include non-representative patients or studies which apply different reference standards for positive and negative test results. The smallest overestimation occurred when blinding was not adhered to during the study. The following article in this series will discuss how to critically appraise a diagnostic study for validity.

Table 5  Levels of evidence for diagnostic studies.7

References

1. Elstein A, Schwartz:. Clinical problem solving and diagnostic decision making: a selective review of the cognitive research literature. In: Knottnerus JA (Ed). The Evidence Base of Clinical Diagnosis. London, England: BMJ books, 179 – 195, 2002.
2. Richardson WS, Wilson M: The process of diagnosis. In: Guyatt G, Bhandari M, Tornetta P, Schemitsch EH, Sprint Study Group: Users guides to the medical literature. New York, New York: McGraw-Hill, 399 – 406, 2008.
3. Cole C, ,Seto C, Gazewood J: Plantar fasciitis: Evidence-based review of diagnosis and therapy. Am Fam Physician 72: 2237 – 2242, 2005.
4. Lavery L, Armstrong DG, Peters EJG, Lipsky BA: Probe-to-Bone Test for Diagnosing Diabetic Foot Osteomyelitis. Diabetes Care 30: 270 – 274, 2007.
5. Pauker S, Kassirer J: The threshold approach to clinical decision making. NEJM 302: 1190 – 1116, 1980.
6. Deeks J, Altman D: Diagnostic tests 4: likelihood ratios. BMJ 329: 168 – 169, 2004.
7. Altman D, Bland JM: Statistics notes: Diagnostic tests 2: predictive values. BMJ 309: 102, 1994.
8. Predictive values http://www.poems.msu.edu/InfoMastery/Diagnosis/PredictiveValues.htm. Accessed 8/25/2009. Accessed 09/09/2009.
9. Montori V, Wyer P, Newman T, Keitz S, Guyatt G: Tips for learners of evidence-based medicine: 5. The effect of spectrum of disease on the performance of diagnostic tests. CMAJ 173: 385 – 390, 2005.
10. Patel A, Gaines K., Malmut R., Park T, Del Toro D, Holland N: Usefulness of electrodiagnostic techiques in the evaluation of suspected tarsal tunnel syndrome: An evidence-based review. Muscle and Nerve 32: 236 – 240, 2005.
11. Butalia S, Palda V, Sargeant R, Detsky A, Mourad O: Does this patient with diabetes have osteomyelitis of the lower extremity? JAMA 299: 806 – 813, 2008.
12. Likelihood Ratio Calculator http://araw.mede.uic.edu/cgi-alansz/testcalc.pl Accessed 3/8/2009.
13. Post-test probability of disease calculator. http://homepage.mac.com/aaolmos/Posttest/posttest.html Accessed 3/9/2009.
14. Grayson ML, Gibbons GW, Balogh K, Levin E, Karchmer AW: Probing to bone in infected pedal ulcers. A clinical sign of underlying osteomyelitis in diabetic patients. JAMA 273: 721 – 723, 1995.
15. Mayer D: Essential Evidence-based Medicine. Cambridge, England: Cambridge University press, 233 – 236, 2004.
16. Moore A, McQuay H: Systematic reviews of diagnostic tests. In: Bandolier’s Little Book of Making Sense of the Medical Evidence. London, England: Oxford University press, 236 – 242, 2006.
17. Lijmer JG, Mol BW, Heisterkamp S, Bonsel GJ, Prins MH, van der Meulen JHP, Bossuyt PMM: Empirical evidence of design-related bias in studies of diagnostic tests. JAMA 282: 1061 1066, 1999.


Address correspondence to: Michael Turlik, DPM
Email: mat@evidencebasedpodiatricmedicine.com

Private practice, Macedonia, Ohio.

© The Foot and Ankle Online Journal, 2009

Evaluation of Clinical Practice Guidelines

by Michael Turlik, DPM1

The Foot and Ankle Online Journal 2 (9): 5

Clinical practice guidelines are defined and their use is explained. Two published guidelines dealing with heel pain are evaluated for validity using a common readily available validated instrument which can be accessed on the internet.

Key words: Evidence-based medicine, industry sponsored trials.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License.  It permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. ©The Foot and Ankle Online Journal (www.faoj.org)

Accepted: August, 2009
Published: September, 2009

ISSN 1941-6806
doi: 10.3827/faoj.2009.0209.0005


Clinical practice guidelines (CPG) are systematically developed documents which are published to provide specific recommendations to standardize the process of diagnosis and treatment of common clinical disorders for clinicians, patients and healthcare administrators. They are derived from the best available evidence and current best practice. Authors of CPG’s gather appraise and combine evidence similar to systematic reviews however, unlike systematic reviews CPG’s provide actionable recommendations at a clinical level. It is hoped that by publishing guidelines that there is a decrease of ineffective care with a corresponding increase of effective care. Outcomes of CPG’s would be: increased consistency, higher quality care and more predictable healthcare processes. CPG’s are developed by various methods, by diverse stakeholders for different purposes. The process of developing a CPG should be transparent and minimize bias. The recommendations for patient care made should be clear and understandable. Each recommendation should reference the source of the information used to generate the recommendation.

The information cited by the guideline developers for each recommendation should be appraised for its quality and graded for its strength. Primary studies used to make the recommendations in the CPGs may be valid however, the strength will be graded lower if they demonstrate a small imprecise effect or have substantial risks and costs. Using the information provided by the authors of the guideline regarding study quality and grade the podiatric physician will be able to have confidence in the recommendations generated. CPGs are not flawless or a substitute for clinical judgment. CPG’s should not be considered as mandatory statements since not all recommendations may be linked to high quality evidence. In addition, not all podiatric physicians may function in the same environment with the same patients used to develop the CPG’s. CPG’s may be valid but not relevant to the specific clinician or patient.

If CPG’s are used by podiatric physicians they should have an understanding of how to critically analyze the guideline for validity, interpret the results and determine its generalizability to their unique situation. Systematic errors in the development of the CPG can distort the recommendations away from the truth.

What can go wrong; incorrect search strategies can result in loss of important papers, inadequate appraisal and synthesis of the papers found can result in incorrect recommendations, wording, format or structure which is confusing can lead to misunderstanding. Finally, it is important to the podiatric physician to determine who paid for the study, if there was any conflict of interest involving the authors and what was done about it. The purpose of this paper is to provide instruction for podiatric physicians in evaluating CPG’s. Two different CPG’s dealing with a common podiatric complaint heel pain [1,2] will be compared and contrasted for validity and relevance.

Evaluating clinical practice guidelines

There are several published instruments in use to evaluate clinical practice guidelines. The Conference on Guideline Standardization [3] (COGS) developed and published an 18 item instrument to evaluate the validity of clinical practice guidelines. The Grading of Recommendations Assessment, Development, and Evaluation working group4 (GRADE) was begun in 2000 to develop a common, sensible and transparent approach to grading the quality of evidence and strength of recommendations used in clinical practice guidelines. The AGREE instrument5 was developed 2001 by an international group of researchers and policymakers. The AGREE instrument will be used in this article to critically analyze the clinical practice guidelines referenced earlier.

The AGREE website5 provides the instrument to evaluate a CPG and a training manual for the instrument. The AGREE instrument consists of 23 separate items grouped into six quality domains (Table 1) which measure both internal and external validity of CPG’s. It is a validated generic instrument which can be used to evaluate new, existing and revised CPG’s. The AGREE instrument evaluates the process not the content of the CPG.

Table 1 AGREE quality domains.

AGREE Evaluation

Scope and purpose

This domain consists of three separate questions (Q) which evaluate the overall aim of the guideline, the specific clinical questions and target patient population. Answers to these questions allow the podiatric physician to determine if the CPG is relevant to his or her practice setting (generalizability).

Q1 The overall objective (s) of the guideline is (are) specifically described.

The American College of Foot and Ankle Surgeons (ACFAS) heel pain guideline [1] does not explicitly state an overall objective. The American Physical Therapy Association (APTA) guideline [2] is one of a series of guidelines produced by APTA. General purposes for the series of guidelines are explicitly stated.

Q2 The clinical question (s) covered by the guideline is (are) specifically described.

The development of foreground questions utilizing the PICO (Patient, Population, Problem, Intervention, Exposure, Comparison and Outcome) technique have been described elsewhere. [6] The ACFAS heel pain guideline1 does not explicitly state clinical question (s) to be covered. The APTA guideline [2] specifically describes two different tasks which it hopes to accomplish.

Q3 The patients to whom the guideline is meant to apply are specifically described.

Neither the ACFAS heel pain guideline [1] nor the APTA guideline [2] explicitly state the patients in whom the guideline is meant to apply.

Stakeholder Involvement

This domain is composed of a series of questions which focus on the extent to which the guideline represents the views of its intended users. The answers to these questions will help the podiatric physician in determining the CPG’s relevance to his or her clinical practice.

Q4 The guideline development group includes individuals from all relevant groups.

The guideline development group should be diverse to include various stakeholders; end users, policy makers and consumers. [7] It is of interest to the podiatric physician if the guideline development group includes a podiatrist.

The ACFAS heel pain guideline [1] was developed by podiatrists with membership in the ACFAS. No other groups were involved. The APTA guideline [2] was developed principally by physical therapists some with advanced degrees as well as, an orthopedic surgeon specializing in foot and ankle care.

Q5 The patient’s views and preferences have been sought.

It is important in evidence-based medicine to include the values and concerns of patients. [8]

There is no evidence that the ACFAS heel pain guideline [1] or the APTA guideline [2] included patient’s views and preferences in developing the CPG.

Q6 The target users of the guideline are clearly defined.

From a podiatric physician’s viewpoint it is important to consider whether the target user of the guideline specifically lists podiatry.

The ACFAS heel pain guideline [1] does not define target users of the guideline. The authors of the APTA guideline [2] define the target users as orthopedic physical therapy clinicians, students, residents, academic instructors, clinical instructors, fellows and interns. Podiatric physicians are not mentioned in the APTA guideline as authors, reviewers or intended recipients.

Q7 The guideline has been piloted among targeted users.

The ACFAS heel pain guideline [1] has not been piloted among targeted users. The APTA guideline [2] authors provide a detailed and comprehensive explanation of the review process. The guideline was reviewed by multiple varied healthcare practitioners for feedback prior to being finalized.

Rigor of development

This domain relates to the process to gather and synthesize the evidence, the methods to formulate the recommendations and to update them. The answers to these questions will help the podiatric physician in determining the internal validity of the CPG.

Q8 Systematic methods were used to search for the evidence.

An earlier publication has covered this topic in some detail. [9] The ACFAS heel pain guideline [1] provided no information regarding the search strategy in the development of the CPG. The authors of the APTA guideline [2] discussed why a systematic search could not be utilized in the development of the CPG.

Q9 The criteria for selecting the evidence are clearly described.

The ACFAS heel pain guideline [1] does not provide any criteria for selecting the evidence used in the development of the CPG. The APTA guideline [2] in the methods section described the criteria which were used to select the evidence used in the CPG.

Stakeholder Involvement

This domain is composed of a series of questions which focus on the extent to which the guideline represents the views of its intended users. The answers to these questions will help the podiatric physician in determining the CPG’s relevance to his or her clinical practice.

Q4 The guideline development group includes individuals from all relevant groups.

The guideline development group should be diverse to include various stakeholders; end users, policy makers and consumers. [7] It is of interest to the podiatric physician if the guideline development group includes a podiatrist.

The ACFAS heel pain guideline [1] was developed by podiatrists with membership in the ACFAS. No other groups were involved. The APTA guideline [2] was developed principally by physical therapists some with advanced degrees as well as, an orthopedic surgeon specializing in foot and ankle care.

Q5 The patient’s views and preferences have been sought.

It is important in evidence-based medicine to include the values and concerns of patients. [8]

There is no evidence that the ACFAS heel pain guideline [1] or the APTA guideline [2] included patient’s views and preferences in developing the CPG.

Q6 The target users of the guideline are clearly defined.

From a podiatric physician’s viewpoint it is important to consider whether the target user of the guideline specifically lists podiatry.

The ACFAS heel pain guideline [1] does not define target users of the guideline.

The authors of the APTA guideline [2] define the target users as orthopedic physical therapy clinicians, students, residents, academic instructors, clinical instructors, fellows and interns. Podiatric physicians are not mentioned in the APTA guideline as authors, reviewers or intended recipients.

Q7 The guideline has been piloted among targeted users.

The ACFAS heel pain guideline [1] has not been piloted among targeted users. The APTA guideline [2] authors provide a detailed and comprehensive explanation of the review process. The guideline was reviewed by multiple varied healthcare practitioners for feedback prior to being finalized.

Rigor of development

This domain relates to the process to gather and synthesize the evidence, the methods to formulate the recommendations and to update them. The answers to these questions will help the podiatric physician in determining the internal validity of the CPG.

Q8 Systematic methods were used to search for the evidence.

An earlier publication has covered this topic in some detail. [9] The ACFAS heel pain guideline [1] provided no information regarding the search strategy in the development of the CPG. The authors of the APTA guideline [2] discussed why a systematic search could not be utilized in the development of the CPG.

Q9 The criteria for selecting the evidence are clearly described.

The ACFAS heel pain guideline [1] does not provide any criteria for selecting the evidence used in the development of the CPG. The APTA guideline [2] in the methods section described the criteria which were used to select the evidence used in the CPG.

Q15 The recommendations are specific and unambiguous.

The ACFAS heel pain guideline [1] does make recommendations but they are not unambiguous and not specific. In contrast the APTA guideline [2] makes specific and unambiguous recommendations.

Q16 The different options for management of the condition are clearly presented.

The ACFAS heel pain guideline [1] and the APTA guideline [2] both describe different options for management in its CPG.

Q17 The key recommendations are easily identifiable.

The ACFAS heel pain guideline [1] and does not make the key recommendations easily identifiable however; the APTA guideline [2] does.

Q18 The guideline is supported with tools for application.

The entire document is usually large and cumbersome and if a more condensed easily accessed version is not produced for clinicians and patients it is unlikely that it will be utilized as effectively. [13]

The ACFAS heel pain guideline [1] does not provide any tools for application. The APTA guideline [2] does provide a single page listing the recommendations with the grade and strength of the evidence at the beginning of the publication. This allows the recommendations to be easily used by interested parties.

Application

The questions in this domain pertain to the likely organizational, behavioral and cost implications of applying the CPG.

Q19The potential organizational barriers in applying the recommendations have been discussed.

Organizational barriers may limit the usefulness and application of the CPG. A recent article provides an in-depth discussion regarding this topic. [14]

Neither the ACFAS heel pain guideline [1] nor the APTA guideline [2] discussed the potential organizational barriers in applying the CPG.

Q20 The possible cost implications of applying the recommendations have been considered.

Given the rapidly changing dynamics the health care policy of the United States it would be short sighted to not consider the cost implications of recommendations if data were available. [15]

Neither the ACFAS heel pain guideline [1] nor the APTA guideline [2] discussed the cost implications of applying the recommendations.

Q21 The guideline presents key review criteria for monitoring and/or audit purposes.

The ACFAS heel pain guideline [1] does not provide any information concerning criteria for monitoring and/or audit purposes. The APTA guideline [2] recommends the use of validated self-reported instruments to monitor response to treatment and gives several examples.

Editorial independence

Q22 The guideline is editorially independent from the funding body.

The ACFAS heel pain guideline [1] does not discuss who has funded the study but explicit in the document is that it is authored by a committee from the ACFAS. It is not clear if the development of the guideline is editorially independent from the ACFAS. The APTA guideline [2] is authored by members of the orthopedic section of the APTA. It is not clear who has funded the guideline and if the authors are editorially independent of the APTA.

Q23 Conflict of interest of guideline members have been recorded.

It has been shown quite clearly that industry sponsored studies are likely to provide pro industry results. [16] It is important for guideline developers to provide information to the users regarding how conflict of interest was dealt with when found. [17] It is thought that the most common source of bias in CPG’s is financial. [18] In a survey of physician authors of CPG’s 87% had some form of interaction with the pharmaceutical industry. [19]

Neither the ACFAS heel pain guideline [1] nor the APTA guideline [2] discussed a conflict of interest process.

Response scale

Each of the 23 items of the AGREE instrument are individually evaluated on a four-point scale. [5] The scale measures the extent to which the item has been fulfilled. The higher the number the greater the AGREE criteria have been met by the authors of the guideline. Comparing the two different guidelines (Table 2) the guideline produced by ACFAS scored lower using the AGREE instrument when compared to the APTA guideline.

Table 2 Results of comparison ACFAS / APTA guidelines using the AGREE instrument.

In a review of CPG’s published by specialty societies the authors found that 88% did not report information regarding the search strategy and 82% did not report recommendations specifically linked to the quality and grade of the evidence used.20 This is consistent with the results of the CPG produced by the ACFAS. Neither guideline scored well in the domains of applicability and editorial independence. This is consistent with other reviews of CPGs21,22 which found that applicability and editorial independence domains were rated lowest using the AGREE instrument.

Conclusion

Older clinical practice guidelines are characterized by narrative reviews and expert opinions without explicit evaluation of the best evidence available.23 Based upon the results of the AGREE instrument the ACFAS clinical practice guideline follows an older expert based format. The APTA clinical practice guideline follows a more contemporary approach to the development of clinical practice guidelines. It is characterized by its adherence to evidence-based principles. APTA guidelines contain clear explicit actionable recommendations which are linked to evidence which has been evaluated for grade and strength. The APTA guideline does not provide comprehensive recommendations for medical treatment and no recommendations for surgical treatment of heel pain thus limiting its relevance to practicing podiatric physicians.

References

1. Thomas J, Christensen J, Kravitz S, Mendicino R, Schuberth J, Vanore J, Scott Weil L, Zlotoff H, Couture S: Clinical practice guideline heel pain panel: The diagnosis and treatment of heel Pain. JFAS 40: 329 – 340, 2001.
2. McPoil T, Martin R, Cornwall M, Wukich D, Irrgang J, Godges J: Heel pain – Plantar fasciitis: Clinical practice guidelines linked to the international classification of function, disability, and health from the orthopaedic section of the American Physical Therapy Association. J Orthop Sports Phys Ther 38: 629 – 648, 2008.
3. COGS http://gem.med.yale.edu/cogs/ Accessed 7/15/2009
4. GRADE http://www.gradeworkinggroup.org/ Accessed 7/15/09.
5. AGREE http://www.agreecollaboration.org/ Accessed 7/15/2009.
6. Turlik M: Introduction to evidence-based medicine. Foot and Ankle Online Journal 2: 2009.
7. Fretheim A, Schünemann H, Oxman A: Improving the use of research evidence in guideline development: Group composition and consultation process. Health Research Policy and Systems 4:15, 2006.
8. Schünemann H, Fretheim A, Oxman A: Improving the use of research evidence in guideline development: Integrating values and consumer involvement. Health research policy and systems 4: 22, 2006.
9. Turlik, M. Evaluation of a review article. Foot and Ankle Online Journal. 2: 2009.
10. Making group decisions and reaching consensus. http://www.nice.org.uk/niceMedia/pdf/GDM_Chapter9.pdfaccessed Accessed 7/20/2009.
11. Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, Alonso-Coello P, Schünemann HJ, GRADE Working Group:. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ 336: 924 – 26, 2008
12. Shekelle PG, Ortiz E, Rhodes S, Morton SC, Eccles MP, Grimshaw JM, Woolf SH: Validity of the agency for healthcare research and quality clinical practice guidelines how quickly do guidelines become outdated? JAMA 286: 1461 – 1467, 2001.
13. Trevena L, Davey H, Barratt A, Butow P, Caldwell P: A systematic review on communicating with patients about evidence. J Evaluation Clinical Practice. 12: 13 – 23, 2006.
14. Shiffman R, Dixon J, Brandt C, Essaihi1 A, Hsiao A, Michel G, O’Connell R: The GuideLine Implementability Appraisal (GLIA): development of an instrument to identify obstacles to guideline implementation. BMC Medical Informatics and Decision Making 5: 23, 2005.
15. Edejer T: Improving the use of research evidence in guideline development: Incorporating considerations of costeffectiveness, affordability and resource implications. Health Research Policy Systems 4: 23, 2006.
16. Turlik M. Special considerations when reviewing industry sponsored studies. Foot and Ankle Online Journal. 2: 2009.
17. Boyd E, Bero L: Improving the use of research evidence in guideline development: Managing conflicts of interests. Health Research Policy and Systems 4: 16, 2006.
18. Allan S. Detsky A: Sources of bias for authors of clinical practice guidelines. CMAJ 175 (9): 1033, 2006.
19. Choudry N, Stelfox H, Detsky A: Relationships Between Authors of Clinical Practice Guidelines and the Pharmaceutical Industry. JAMA: 287:612 – 617, 2002.
20. Grilli R, Magrini N, Penna A, Mura G, Liberati A: Practice guidelines developed by specialty societies: the need for a critical appraisal. Lancet 355: 103 – 106, 2000.
21. Cates J, Young D, Bowerman D, Porter R: An independent AGREE Evaluation of the occupational medicine practice guidelines. The Spine J 6: 72 – 77 , 2006.
22. Hurdowar A, Graham I, Bayley M, Harrison M, Wood-Dauphinee S, Bhogal S: Quality of stroke rehabilitation clinical practice guidelines. J Evaluation Clinical Practice 13: 657 – 664, 2007.
23. Poolman R, Verheyen C, Kerkhoffs G, Bhandari M, Schünemann H: From evidence to action:
Understanding clinical practice guidelines. Acta Orthopaedics 80: 113 – 118, 2009.


Address correspondence to: Michael Turlik, DPM
Email: mat@evidencebasedpodiatricmedicine.com

1 Private practice, Macedonia, Ohio.

© The Foot and Ankle Online Journal, 2009

Evaluating the results of a Systematic Review/Meta-Analysis

by Michael Turlik, DPM1

The Foot and Ankle Online Journal 2 (7): 5

This is the second of two articles discussing the evaluation of systematic reviews for podiatric physicians. This article will focus on publication bias, heterogeneity, meta-analysis analytic and sensitivity analysis. A recent article related to plantar foot pain will be critically evaluated using the principles discussed in the paper.

Key words: Evidence-based medicine, review article, meta-analysis.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License.  It permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. ©The Foot and Ankle Online Journal (www.faoj.org)

Accepted: June, 2009
Published: July, 2009

ISSN 1941-6806
doi: 10.3827/faoj.2009.0207.0005


In the event that the primary studies selected for the systematic review are so dissimilar (heterogeneity) that it is ill-suited to combine the treatment effects, the systematic review will end with a table describing all of the articles abstracted. The table should contain each individual reference with the abstracted information to include: the results of the study as well as, the quality evaluation of the article done by the authors of the systematic review. The results of a systematic review are qualitative rather than quantitative (meta-analysis). The evaluation of individual randomized controlled trials has been covered earlier in this series. [1,2,3] The authors in the narrative results section should explain why the studies were unable to be combined into a pooled estimate of effect (meta-analysis).

Meta-Analysis

The results of a systematic review are a function of the quantity and quality of studies found during the review. The conclusion of a systematic review may be that after reviewing the published studies the clinical question cannot be answered and that there is a need for a larger, or a more rigorous study design to answer the clinical question. [4,5]

This article is the second and final article explaining systematic reviews/meta-analysis. The first article evaluated the internal validity of a systematic review. [6] The purpose of this article is to explain the results section of a meta-analysis using a recent meta-analysis of extracorporal shockwave therapy (ESWT) for mechanically induced heel pain [7] as a guide.

A meta-analysis uses statistical techniques to combine data from various studies into a weighted pooled estimate of effect. Meta-analysis overcomes small sample sizes of primary studies to achieve a more precise treatment effect. In addition, meta-analysis is thought to increase power and settle controversies from primary studies. When not to perform a meta-analysis: the studies are of poor quality, serious publication bias is detected or the study results are diverse.

Publication Bias

Reporting bias can be defined as the author’s inclination not to publish either an entire study or portions of the study based upon the magnitude, direction or statistical significance of the results. [8] A type of reporting bias is publication bias, which refers to the fact that the entire study has not been published.

Systematic reviews which fail to search and find unpublished studies which report negative results may lead to an over estimation of the treatment effect.

Small trials with negative results are unlikely to be published and if they are may be published in less prominent journals.

Large studies which report positive results may receive a disproportionate amount of attention. They may be actually published more than once. This is the opposite of publication bias. Therefore, it is important for the authors performing a meta-analysis to eliminate duplicate publications otherwise the treatment effect will be overestimated.

A common method to search for publication bias is to construct a funnel plot (Fig 1, 2). A funnel plot for evaluation of publication bias is a scatter diagram of randomized controlled trials found as a result of the systematic review in which the treatment effect of the intervention appears along the X axis while the trial size appears along the Y axis.

Figure 1  Hypothetical funnel plot which does not show publication bias.

Figure 2  Hypothetical funnel plot which does show publication bias.Figure 2 here

The precision of estimating a treatment effect from a clinical trial increases with increasing sample size and event rate. Smaller studies show a large variation in treatment effect at the bottom of the funnel plot. When no publication bias is present the graphical representation reveals an inverted funnel (Fig 1).

When publication bias is present typically it will be noticed that smaller studies are missing which do not favor the intervention typically the lower right-hand side of the plot resulting in an asymmetrical presentation (Fig 2). It is difficult to evaluate publication bias in the meta-analysis using a funnel plot if the study is composed of a small number of trials with small sample sizes. [9] The reader is referred to the following references for a more complete explanation of the subject matter. [10,11]

Returning to our article evaluating ESWT for mechanically induced heel pain,[7] in the methods section the authors state that they will use a funnel plot to evaluate for publication bias. A funnel plot could not be found when reviewing the figures in the results section. At the end of the article the authors discuss in the narrative of the study their findings regarding publication bias. The authors were unable to recognize the existence of small, unpublished studies showing no statistically significant benefits. As a result it is likely that the treatment effect found many overestimate the actual treatment effect.

Heterogeneity

It is common to expect some variability between studies. However, if the variability between studies is significant the inference of the meta-analysis is decreased, and it may no longer may make sense to pool the results from the various studies into a single effect size.

There are two types of heterogeneity, clinical and statistical. [12] Are the patient populations, interventions, outcome instruments and methods similar from study to study (clinical heterogeneity)?

Are the results similar from study to study (statistical heterogeneity)? Large differences in clinical heterogeneity improves generalizability however, may produce large differences in results which weakens any inference drawn from the study.

Clinical heterogeneity is best evaluated qualitatively. It is a clinical judgment based upon the reader’s understanding of the disease process. The reader needs to ask the following question; is it likely based upon the patient populations, the outcomes used, interventions evaluated and methodology of the study that the results would be similar between studies? If the answer to this question is no then a meta-analysis does not make sense. If the answer to this question is yes the authors should proceed to evaluate statistical heterogeneity.

Statistical heterogeneity can be evaluated both qualitatively and quantitatively. Qualitative evaluation involves developing a forest plot of the point estimates and corresponding 95% confidence intervals of the various primary studies selected for pooling (Fig 3). Are the point estimates from the various primary trials similar from study to study and do the 95% confidence intervals about the point estimates overlap? If the answer is yes, there is not significant heterogeneity and a pooled treatment estimate makes sense. For example, in the forest plot from the ESWT study [7] (Fig 3) although the point estimates do not all favor the intervention they are fairly close to each other. In addition, there appears to be overlap of the 95% confidence intervals for all of the studies. The conclusion one should reach is that there is not significant heterogeneity in this systematic review and therefore one should proceed to pool the data. In contrast, when the point estimates are not grouped together and the 95% confidence levels do not overlap then significant heterogeneity exists and that data should not be pooled.

Figure 3  Results from ESWT study7 (presented in forest plot).

Statistical heterogeneity can also be evaluated by statistical tests. [13] The two common tests are Cochran’s Q and the I2 statistic. Cochran’s Q is the traditional test for heterogeneity. It begins with the null hypothesis that the magnitude of the effect is the same across the entire study population. It generates a probability based upon the Chi squared distribution. The test is underpowered therefore; p > 0.1 indicates lack of heterogeneity. I2 is a more recent statistical test to evaluate for heterogeneity. [14] The closer to zero I2 is the more likely any difference in variability is due to chance. Less than 0.25 is considered mild, between 0.25 and 0.5 is considered moderate greater than 0.5 is considered a large degree of heterogeneity.

The options for systematic reviews which demonstrate significant heterogeneity are the following: do not perform a meta-analysis, perform a meta-analysis using a random effects model, explore and explain heterogeneity of the study [15] using sensitivity analysis / meta-regression.

The authors of the ESWT study [7] present in the results section in narrative and table format clinical characteristics of the primary studies.

In addition, they presented the point estimates and 95% confidence intervals of the primary studies in a forest plot with results from Cochran’s Q. as well as, I2 (Fig 3). Their conclusion is that there was not significant heterogeneity present and therefore pooling of the data was appropriate.

Meta-Analytic Models

The two different models used to combine data in a meta-analysis are random effect and fixed effect. [8] Both involve calculating a weighted average from the results of the primary studies. The larger the study the more impact it will have on the combined treatment effect. The fixed effect model assumes data between studies are roughly the same and any differences are due to random error. There are different fixed effect tests which can be used depending upon the type of data and the precision of the studies included. The random effects model is used when heterogeneity is encountered in the primary studies and offers a more conservative estimate. The main method is the DerSimonian Laird test. The random effects model provides less weight to larger studies and has larger confidence intervals generated about the effect size. The estimates of effect should be similar between fixed effect and random effect models if the studies do not show heterogeneity. If there is significant heterogeneity the results will differ sometimes greatly. If the meta-analysis combines different types of outcomes the results may be reported as an effect size. An effect size less than 0.2 indicates no effect greater than 0.2 indicates a small effect, greater than 0.5 indicates a moderate effect greater than 0.8 indicates a large effect.

The results of the meta-analysis should be presented as a summary point estimate with 95% confidence intervals. The authors of the meta-analysis should place the results in a clinical perspective and determine if the results are clinically significant.

The authors of the ESWT study [7] chose to use a fixed effect model to pool the data from the primary studies. The authors presented their findings in the results section using figures (Fig 3) and text. The pooled estimate of a 10 cm VAS scores for morning pain at 12 weeks with 95% confidence intervals is reported. The authors conclude that the pooled estimate although statistically significant in favor of ESWT is not clinically significant.

Sensitivity Analysis

Sensitivity analysis is often carried out in meta-analyses to evaluate potential sources of bias. For example, do the results of the meta-analysis vary with trial quality, trial size, type of intervention, patient characteristics, outcome or any other variable usually determined a priori. As with any other type of subgroup analysis precautions should be undertaken when interpreting their results. [8]

The authors of the ESWT study [7] performed a sensitivity analysis comparing the results as a function of study quality. When only the trials which were judged to be a higher quality were used in the meta-analysis the results failed to reveal a statistically significant result. This is consistent with the concept that trials which lack methodological rigor overestimate the treatment effect of interventions. The authors conclude that the meta-analysis performed does not support the use of ESWT in the treatment of mechanically induced heel pain.

References

1. Turlik M: Evaluating the internal validity of a randomized controlled trial. Foot and Ankle Online Journal. 2 (3): 5, 2009.
2. Turlik M: How to interpret the results of a randomized controlled trial. Foot and Ankle Online Journal. 2 (4): 4, 2009.
3. Turlik M: How to evaluate the external validity of a randomized controlled trial. Foot and Ankle Journal 2 (5): 5, 2009.
4.Edwards J: Debridement of diabetic foot ulcers. Cochrane Reviews, http://www.cochrane.org/reviews/en/ab003556.html. Accessed 2/23/09.
5. Valk G, Kriegsman DMW, Assendelft WJJ: Patient education for preventing diabetic foot ulceration. Cochrane Reviews.
http://www.cochrane.org/reviews/en/ab001488.html. Accessed 2/23/09.
6. Turlik, M. Evaluation of a Review Article. Foot and Ankle Journal 2:, 2009.
7. Thomson CE, Crawford F, Murray GD: The effectiveness of extra corporeal shock wave therapy for plantar pain: a systematic review and meta-analysis. Musculoskeletal Disorders 6:19, 2005.
8. Guyatt G, Drummond R, Meade M, Cook D: Users’ guides to the medical literature. New York, McGraw-Hill Medical, 2008.
9. Egger M, Davey Smith G: Bias in meta-¬analysis detected by a simple, graphical test. BMJ 315: 629 – 634, 1997.
10. Sterne JAC, Egger M, Davey Smith G: Systematic reviews in health care: Investigating and dealing with publication and other biases in meta-analysis. BMJ 323: 101 – 105, 2001.
11. John PA, Ioannidis J, Trikalinos T: The appropriateness of asymmetry tests for publication bias in meta-analyses: a large survey. 176 (8): 1091 – 1096, 2007.
12. Hatala R, Keitz S, Wyer P, Guyatt G: Tips for teachers of evidence-based medicine: 4. Assessing heterogeneity of primary studies in systematic reviews and whether to combinetheir results. CMAJ 172: 661 – 665, 2005.
13. Fletcher J: What is heterogeneity and is it important? BMJ: 334: 94 – 96, 2007.
14. Higgins JPT, Thompson SG, Deeks JJ, Altman DG: Measuring inconsistency in meta-analyses. BMJ 327:557 – 560, 2003.
15. Ioannidis J, Patsopoulos NA, Rothstein HR: Reasons or excuses for avoiding meta-analysis in forest plots. BMJ 336: 1413 – 1415, 2008.


Address correspondence to: Michael Turlik, DPM
Email: mat@evidencebasedpodiatricmedicine.com

1 Private practice, Macedonia, Ohio.

© The Foot and Ankle Online Journal, 2009

Evaluation of a Review Article

by Michael Turlik, DPM1

The Foot and Ankle Online Journal 2 (6): 5

Different types of reviews are described and it is suggested that systematic reviews/meta-analysis are the best type of review to use to determine treatment efficacy. How to evaluate a systematic review is explained using as an example an article from the foot and ankle literature. This is another article in the ongoing evidence-based medicine series produced for The Foot and Ankle Online Journal.

Key words: Evidence-based medicine, review article, meta-analysis

Accepted: May, 2009
Published: June, 2009

ISSN 1941-6806
doi: 10.3827/faoj.2009.0206.0005

Is this treatment effective? The best way to answer this question is to review level I evidence for therapeutic decisions. [1] Previous articles in this series dealt with an introduction to evidence-based medicine2 and the critical analysis of randomized controlled trials. [3,4,5] Randomized controlled trials can be considered level I evidence for therapeutic decision-making. Depending on the sample size and the event rate randomized controlled trials may lack precision. Therefore, combining several randomized controlled trials on the same subject into a review article with a pooled estimate of effect if done so that bias can be minimized results in a more precise estimate of treatment effect.

A systematic review is a research article which uses explicit searching, evaluating and reporting criteria to minimize bias. The methods section of the article should clearly explain the criteria used to conduct the study. For therapeutic interventions if the articles used in the systematic review are limited to randomized controlled trials this type of study can be considered level I evidence. [1]

Narrative reviews/textbooks are usually expert-based rather than evidence-based. They are conducted using background questions rather than foreground questions. Trials may be selectively reported to match the author’s background and views. Typically these reviews do not report the method by which they have been compiled as a result, the conclusions of the review may be biased and it does not allow other researchers to replicate the results. Narrative reviews/textbooks lack transparency.

When critically analyzing a systematic review the general structure of the process is similar to a randomized controlled trial. [1] Does the study minimize the likelihood of bias (Internal Validity)? What are the results of the review? Can and should the results be applied to clinical practice (External Validity)?

Internal Validity

The planning and design of a systematic review follows well-defined criteria. (Table 1)

Table 1 Elements of a Systematic Review.

This allows for limiting bias and provides for transparency of the review process. A specialized type of systematic review which quantitatively pools results from randomized controlled trials is a meta-analysis. While all meta-analysis begin with a systematic review not all systematic reviews is a meta-analysis. Sometimes the studies are so different (heterogeneity) that the results may not be able to be pooled quantitatively. The purpose of this article is to provide information to allow podiatric physicians to critically analyze a systematic review. This will be accomplished by critically analyzing a meta-analysis of extra corporeal shock wave therapy (ESWT) for mechanically induced heel pain. [6] The following article in this series will specifically address the meta-analysis section of the ESWT paper.

Clinical Question

The first step in critically analyzing a systematic review is to evaluate the clinical question which the authors develop to guide the review process. The clinical question the authors formulate should be a focused, answerable, foreground question which utilizes the P (patient) I (intervention)
C (comparison) O (outcome) method. (PICO)

The question posed by the authors in our review article is “Our aim was to determine if ESWT is effective in the treatment of patients with plantar heel pain when compared with a control group”. [6] The question appears to be a focused, answerable, foreground question. Using the PICO method the patient, intervention and outcome are clearly defined however, the comparator was not. Was the comparator a placebo or a gold standard?

Inclusion and Exclusion Criteria

Which articles to include? Criteria which are too broad will increase the chances of heterogeneity. Which studies to exclude? Criteria which are too restrictive may result in loss of important studies. There are many different criteria to consider when performing a systematic review. (Table 2)

Table 2 Variables for inclusion and exclusionary criteria.

When reviewing the criteria for inclusion and exclusion the reader needs to consider the following question; would you expect the results to be similar across the range of patients included, the intervention studied, and the different outcomes measured? If the answer is no then the study criteria are probably too broad. The authors should specify with great clarity inclusionary and exclusionary criteria for the systematic review.

The authors of the ESWT study6 specified inclusion criteria for: the type of study, characteristics of the participants, outcome measure and type of comparator used. Exclusionary criteria consisted of heel pain caused by conditions other than mechanically induced. The other variables in table 2 were not addressed by the authors in the methods section. It would be likely that the results should be similar across the range of patients specified, the intervention selected and the outcomes used in the study.

Literature search

The search strategy of a systematic review should be comprehensive, detailed and exhaustive. The literature search should not be confined only to a MEDLINE review. (Table 3)

Table 3 Potential sources of information for systematic reviews.

In the methods section the authors should describe with sufficient detail the different types of databases which were queried and provide the search string used. Limiting the study to published articles and published articles only in English overestimates the treatment effect. An incomplete search results in retrieval bias.

It is common that journals selectively publish trials which demonstrate positive results. It is also more common to have positive trials published which have been sponsored by a pharmaceutical or device manufacturer which has a financial stake in the outcome. [7] Failure to publish trials or portions of trials with negative results leads to reporting bias. [8,9] The failure to publish an entire study because of results obtained is referred to as publication bias. When results and outcomes are selectively published by the authors this is referred to as selective outcome reporting bias. The result of reporting bias is an exaggerated treatment effect in the systematic review/meta-analysis.

The authors of the ESWT study describe in the methods section several different databases which were searched for original as well as, pre-appraised literature. Dissertations and reference lists of articles retrieved were searched as well (unpublished studies). The search was not limited to articles published in the English language. The search string was provided by the authors. A reference was provided which includes more details regarding the search strategy.

Article acquisition

Typically a single author will review the title and abstract of all references obtained from the search. The author will determine if the article meets the inclusionary and exclusionary criteria defined earlier in the study.

A complete copy of each article found to meet the predetermined criteria based upon the title and abstract then will be obtained for data abstraction. This information usually will be presented in the methods section.
The authors of the ESWT study did not clearly explain the article acquisition process.

Data abstraction

This refers to the process by which the data from the relevant articles is transferred for analysis in the systematic review. The process of data abstraction should be clearly defined in the methods section of the paper. Questions to consider when evaluating a systematic review for data abstraction are listed in table 4.

Table 4  Data abstraction questions.

It is important that more than one person participate in the data abstraction process to limit random and systematic errors.

The authors of the ESWT study described in detail in the methods section the data abstraction process. Two reviewers independently abstracted each of the randomized controlled trials obtained from the search. The authors describe clearly the data which was to be abstracted from each article for the study. Resolution of disagreements was explained and the process of contacting the authors for additional information was described. Other aspects of table 4 were not clearly reported by the authors.

Study Quality

An important part of the systematic review is to assess the quality of the studies selected for the review. Peer review and subsequent journal publication does not guarantee the quality of the published trial. The quality of systematic review is only as good as the studies used! Less rigorous studies overestimate the treatment effect.

There is no universal agreement as to the instrument which should be used to assess study quality. There are several important criteria to look at when evaluating the quality of a randomized controlled trial. (Table 5) It is important for the authors to describe and reference in the methods section the instrument used, if and how it was modified, and whether it was validated.

Table 5  Study quality criteria.

The authors of the ESWT study described and referenced in the methods section their study quality instrument. It included all of the elements in table 4.

Summary

The authors of the ESWT study have described in sufficient detail the methods used by which the study sought to limit bias. Therefore, it is reasonable to conclude that the inferences drawn from the study should be valid.

References

1. Levels of Evidence. 3 [Online], Accessed 2/17/2009.
2 .Turlik M: Introduction to evidence-based medicine. The Foot and Ankle Online Journal 2 (2): 4, 2009.
3.Turlik M: Evaluating the internal validity of a randomized controlled trial. The Foot and Ankle Online Journal 2 (5): 5, 2009.
4. Turlik M: How to interpret the results of a randomized controlled trial. The Foot and Ankle Online Journal 2 (4): 4, 2009.
5. Turlik M: How to interpret the external validity of a randomized controlled trial. The Foot and Ankle Online Journal 2 (5): 5, 2009.
6. Thompson C, Crawford F, Murray GD: The effectiveness of extra corporeal shock wave therapy for plantar heel pain: a systematic review and meta-analysis. BMC Musculoskeletal Disorders 6:19, 2005.
7. Rising K, Bacchetti P, Bero L. Reporting bias in drug trials submitted to the Food and Drug Administration: A review of publication and presentation. PloS Med 5: e217, 2008.
8. Guyatt, G, Drummond R, Meade M, Cook D (editors): Users’ Guides to the Medical Literature. McGraw-Hill medical, New York, 2008.
9. Hasenboehler E, Choudhry IK, Newman JT, Smith WR, 1 Ziran BH, Stahel PF: Bias towards publishing positive results in orthopedic and general surgery: a patient safety issue? Patient Safety in Surgery 1:4, 2007.


Address correspondence to: Michael Turlik, DPM
Email: mat@evidencebasedpodiatricmedicine.com

Private practice, Macedonia, Ohio.

© The Foot and Ankle Online Journal, 2009

How to Interpret External Validity of a Randomized Controlled Trial

by Michael Turlik, DPM1

The Foot and Ankle Online Journal 2 (5): 5

The fourth and final article discussing the interpretation of randomized controlled trials for podiatric physicians involves the critical analysis of external validity. External validity is defined, examples from the foot and ankle literature are reviewed and a critical analysis is explained for the randomized controlled trial involving the use of magnetic insoles in the treatment of symptomatic diabetic neuropathy. This is part of an ongoing series of articles about evidence-based medicine to assist podiatric physicians develop the knowledge, skills and values to be successful in a changing practice environment.

Key words: Evidence-based medicine, external validity, randomized controlled trial

This is an Open Access article distributed under the terms of the Creative Commons Attribution License.  It permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. ©The Foot and Ankle Online Journal (www.faoj.org)

Accepted: April, 2009
Published: May, 2009

ISSN 1941-6806
doi: 10.3827/faoj.2009.0205.0005


When evaluating a randomized controlled trial (RCT) the first consideration is internal validity. Did the authors take appropriate measures to minimize bias? If the reader is satisfied that the authors did plan, implement and report methods which would minimize bias the reader would then evaluate the results of the trial. If the trial results revealed a clinically important difference between treatment arms the final question the reader will have to answer is; can I and should I apply the results of this study to my practice? This is a matter of judgment rather than statistical analysis. It is often referred to as relevance or external validity of the trial. This is the fourth and final article in this series of articles for podiatric physicians designed to help them understand, interpret and implement RCTs in their practices.

Types of Randomized Controlled Trials

Explanatory trials are characterized by having strict inclusion criteria, highly homogenous study groups, usually only present the results as per protocol and often use placebos as the comparator. Explanatory trials answer the question does this intervention work, this relates to the efficacy of the intervention. Pragmatic trials have increased variability of the participants, use an active comparator (gold standard) and evaluate the results as per protocol and intention to treat analysis. Pragmatic trials evaluate the effectiveness of an intervention; does this treatment work in real life? These are two ends of a continuum, the more pragmatic the trial is the better its external validity.

External validity or relevance involves the interpretation of the trial relative to the readers practice environment. Issues to be resolved by the podiatric physician when evaluating a trial for external validity are: trial participants, location of the study, intervention, outcomes and harms described. In general, RCTs are better suited for answering the question; Will this treatment work, rather than is it worth it?

Trial Participants

The first step for evaluating the trial participants is to review the inclusionary and exclusionary criteria listed by the author in the methods section of the article. For example, in a recent study [1] the authors looked at the use of duct tape in the treatment of pedal verruca in children 4-12 years of age. Can the results of this study be generalized to adults? In another recent study [2] of midportion Achilles tendon pain, all participants were confirmed by ultrasound prior to inclusion. Should the results be generalized to patients with insertional Achilles symptoms and should all patients be confirmed by ultrasound prior to beginning treatment as described in the study?

Are the participants in the trial similar with respect to age, gender, co morbidities and severity of disorder? It is easy to rationalize that the patient(s) in the podiatric physicians practice differs from those in the trial regarding some demographic aspect. “My patients are slightly older/younger, have more more/less co morbidities, take multiple/ less medications, speak/don’t speak English, wear the same /different shoes, have the same/different occupation.” Would my patient have been included in the trial? A good rule of thumb is that if the answer to this question is yes the results probably are relevant to your patient (s).

Location of the Study

In this classic study [3] the authors evaluated a clinical maneuver to detect infected bone in hospitalized patients with diabetic foot ulcers. Can the results be generalized to an office setting where the ulcers may be less severe than the participants in the study? Another study 4 looked at pulse versus continuous terbinafine for onychomycosis. The setting was a VA (What does this mean?) hospital in Minneapolis. Should the results be generalized to an upper income suburban private practice?

A common concern regarding study location is; are the results of a trial done in a tertiary care center by specialists relevant to primary care settings?

Another concern would be; are differences in the healthcare delivery system of other countries relevant to podiatric physicians practicing in the United States? The question the podiatric physician needs to answer is; was the setting of this study similar enough to my practice so that the results can be generalized?

Intervention

Is the intervention proposed in the study feasible for me? Am I technically proficient to perform the procedure? Is the device available to me? How much does the intervention cost and who will pay for it? All important considerations before deciding if the intervention is relevant to the individual podiatric physicians practice. The results of a randomized controlled trial may demonstrate a clinical benefit however; the intervention may not be relevant to the podiatric physicians practice.

The methods section of the paper should describe in enough detail the intervention and comparator to allow the reader to make an informed judgment. In a study5 of foot orthoses to treat plantar fasciitis the reader needs to determine if the method of casting and fabrication of the foot orthoses described are similar enough to his /her practice to allow for generalization of the results. A different study [6] evaluated the difference between surgery, foot orthoses and watchful waiting for symptomatic hallux valgus. In addition to the question again regarding foot orthoses, the reader has to decide if the procedure preformed is similar enough to the typical surgical procedure performed for hallux valgus in his/her practice. In an earlier article of this series [7] it was shown that Graftskin was a more effective treatment for diabetic foot ulcers when compared with moistened saline gauze dressings.8 The estimated cost for a single treatment with Graftskin varies from $1000 – $1200. [9] Despite the high cost of Graftskin a recent economic analysis demonstrated that for venous stasis ulcers Graftskin was more effective and less costly than treatment with unna boots. [10]

Outcomes

Is the outcome meaningful to my patients? Patient centered outcomes are the hallmark of evidence-based medicine. Validated patient centered outcomes are the most relevant to clinical practice. The authors should describe the primary outcome(s) in the methods section and reference validation efforts.

In a recent study [11] the authors looked at the use of Alendronate in acute Charcot arthropathy. The authors measured changes in the following to evaluate the success of the intervention: serum collagen COOH-terminal telopeptide of type 1 collagen (ICTP), osteocalcin, testosterone, estradiol, thyroid hormones, parathyroid hormone, follicle-stimulating hormone, leutinizing hormone, IGF-1, calcitriol, urinary hydroxyprolin, serum alkaline phosphatase. These are examples of surrogate outcomes. An important question to ask regarding surrogate outcomes is; is there a strong, independent, consistent association between the surrogate outcome and a patient-important outcome? [12]

In a trial comparing two different surgical procedures [13] for the correction of hallux abduto valgus deformity the authors used multiple outcomes. They consisted of: the American Orthopedic Foot and Ankle Society clinical rating scale for deformities of the hallux, EuroQol, visual analogue scale, and radiographic changes. Some of the outcomes are patient oriented (VAS, EuroQol) some are physician centered outcomes (range of motion, radiographic findings), some are validated others are not validated. Physician centered outcomes, and surrogate outcomes are not as relevant to clinical practice as patient centered outcomes. Composite outcomes may be important however, they need to be cautiously reviewed to ensure the results are not misleading. [14]

The question the podiatric physician needs to answer is; is the outcome used in this study patient centered, and validated? If the answer to this question is yes then the podiatric physician should feel comfortable in generalizing the results to his/her patient(s).

Harms

Were the benefits from the intervention in the trial worth the potential harms that occurred? RCTs present a less biased estimate of conceivable harmful effects than other study designs due to balancing of known and unknown prognostic factors. However, randomized controlled trials are unlikely to enroll enough participants to demonstrate rare and serious adverse effects which may be evident in the larger population. In addition, unlike reporting of efficacy in randomized controlled trials the reporting of harms is thought to be inadequate and incomplete. [15] Larger observational studies are usually better to assess harms from therapeutic interventions than RCT’s. In a study evaluating pharmaceutical versus non pharmaceutical trials in rheumatoid arthritis the authors found that pharmaceutical trials reported data about adverse effects on harms more often than non-pharmaceutical trials. [16] Fewer than half of the non pharmaceutical trials reported any harms.

Although not always reported in the methods section the authors should describe the method by which information on harms were collected and reported. Ideally, in the results section the author should report all of the adverse outcomes which occurred, the frequency with which they occurred and if important the time which they occurred. Consider the number of participants who withdrew from the trial due to adverse effects, and the completeness of the reporting of the adverse effects when evaluating a paper for harms.

In an article evaluating duct tape for pedal verruca [1] in children referenced earlier the authors briefly described in the methods section a passive questionnaire used to elicit information about adverse effects. Adverse effects are described both in the narrative section and a table in the results section. Three participants withdrew from the duct tape arm of the trial and no participants withdrew from the placebo arm of the trial due to adverse effects. Participants using the intervention demonstrated a larger percentage of localized adverse skin effects which were described as mild.

In another article referenced earlier [13] comparing two different surgical procedures for hallux abduto valgus deformity the author does not discuss the evaluation of harms in the methods section. However in the results the authors discuss complications occurring during the study which are limited to one surgical revision due to recurrence one year post operative. One participant developed an asymptomatic nonunion which resolved during the one-year follow-up period. Apparently none of the 100 participants developed a postoperative infection.

Was there a difference in harms between treatment arms? Was the difference statistically/clinically significant? Were there any serious adverse consequences noted between groups? Although not ideal for investigating harms any information on adverse effects presented in the paper should be used by the podiatric physician in deciding if a clinically significant result should be implemented in his or her practice. The information on harms should also be communicated to patients prior to beginning the intervention.

Use of magnetic insoles for diabetic peripheral neuropathy

The initial article of this series [17] identified a RCTs [18] which would assist us in evaluating the usefulness of magnetic insoles in the treatment of symptomatic diabetic neuropathy. The following two articles [7,19] critically evaluated the internal validity and results of the identified article. Using the information presented earlier in this article we will critically evaluate the external validity of the RCT evaluating magnetic insoles for diabetic neuropathy.

Trial Participants

The authors clearly explained the inclusionary and exclusionary criteria necessary for the participants to enter the trial. The entry criteria appear to be rather broad increasing the external validity of the trial.

The only concern would be whether or not magnetic insoles would be indicated as a first-line treatment since the participants were required to have symptoms which were constant and present over six months and which were refractory to various medications. [18]

Location of the Study

Three hundred and seventy five participants were recruited from 48 sites in 27 states. Participants were recruited from various specialties to include neurology, podiatry, diabetic clinics and other private practices. This multicenter study appears to have high external validity with regards to the location of the study.

Intervention

The authors provided a thorough technical description of the magnetic insole used in the study. The magnetic insoles used in the study are readily available over the Internet for less than $100 per pair. Whether and how these magnetic insoles differ from other commercially available magnetic insoles was not addressed in the study.

Outcomes

In the methods section the authors describe the use of two different instruments to measure four different primary outcomes. An 11 point visual analogue pain scale (0-10) was used to measure numbness and tingling/burning pain. A quality of life (QOL) instrument was used to measure foot pain and sleep interruption. Visual analogue scales and quality of life instruments are considered patient centered. In the methods section no references were cited regarding validation of the instruments furthermore, the quality of life instrument was not identified.

Harms

Under the methods section the authors stated that adverse effects would be monitored and reported. The adverse effects the authors were looking for included ulceration, abrasion, allergic reaction or infection. The authors reported in the results section no complications from either trial arm. However, in the flowchart of the trial reference was made to six participants in the intervention arm and four participants in the placebo arm experiencing complications.

Summary

The authors described in sufficient detail methods which would minimize bias with the exception of intention to treat analysis. The results of the study were not clearly presented comparing magnetic insoles and placebo. Using effect sizes calculated from the original data a small clinically significant effect was obtained with one of the four primary outcomes measured. The external validity of the study seems high with regards to trial participants, location of the study and the intervention. The reader will need to decide whether the cost of the intervention is worth the questionable efficacy of the intervention.

References

1. de Haen M, Spigt MG, van Uden CJT, van Neer P, Feron FJM, Knottnerus A: Efficacy of duct tape vs placebo in the treatment of verruca vulgaris (Warts) in primary school children. Arch Pediatr Adolesc Med 160: 1121 – 1125, 2006.
2. Petersen W, Welp R, Rosenbaum D: Chronic Achilles tendinopathy: A prospective randomized study comparing the theraputic effect of eccentric training, the AirHeel brace, and a combination of both. Am J Sports Med 35: 1659 – 1667, 2007.
3. Grayson ML, Gibbons GW, Baloh K, Levin E, Karchmer AW: Probing to bone in infected pedal ulcers: a clinical sign of underlying osteomyelitis in diabetic patients. JAMA 273 (9): 721 – 723, 1995.
4. Warshaw E, Fett D, Bloomfield H, Grill J, Nelson D, Quintero V, Carver S, Zielke G, Lederle F:. Pulse versus continuous terbinafine for onychomycosis: A randomized, double-blind, controlled trial Journal of the American Academy of Dermatology. 53 (4): 578 – 584, 2002.
5. Landorf K, Keenan A-M, Rushworth RL: Effectiveness of foot orthoses to treat plantar fasciitis. Arch Intern Med 166: 1305 – 1310 , 2006.
6. Torkki M, Malmivaara A, Seitsalo S, Hoikka V, Laippala P, Paavolainen P: Surgery vs. orthosis vs watchful waiting for hallux valgus: A randomized controlled trial. JAMA 285:2474 – 2480, 2001.
7. Turlik M. How to interpret the results of a randomized controlled trial. Foot and Ankle Online Journal. 2: 4, 2009.
8. Veves A, Falanga V, Armstrong DG, Sabolinski ML, Apligraf Diabetic Foot Ulcer Study: Graftskin, a human skin equivalent, is effective in the management of noninfected neuropathic diabetic foot ulcers. Diabetes Care 24: 290 – 295, 2001.
9. Hanft J, Williams A, Kyramarios C, Temar K: Are tissue replacements cost effective? Podiatry Today 16: 2003.
10. Schonfeld WH, Villa KF, Fastenau JM, Mazonson PD, Falanga V: An economic assessment of Apligraf® (Graftskin) for the treatment of hard-to-heal venous leg ulcers Wound Repair and Regeneration 8: 251 – 257, 2001.
11. Pitocco D, Pitocco, Ruotolo V, Caputo S, Mancini L, Collina CM, Manto A, Caradonna P, Ghirlanda G: Six-month treatment with alendronate in acute charcot neuroarthropathy. Diabetes Care 28: 1214 – 1215, 2005.
12. Guyatt G, Drummond R, Meade M, Cook D. Users’ guides to the medical literature. New York: McGraw-Hill Medical, 367 – 374, 2008.
13. Saro C, Andrén B, Wildemyr Z, Felländer-Tsai L: Outcome after distal metatarsal osteotomy for hallux valgus: A prospective randomized controlled trial of two methods. Foot and Ankle International 28: 778 – 787, 2007.
14. Montoti V, Montori VM, Busses JW, Miralda GP, Ferreira I, Guyatt GH: How should clinicians interpret results reflecting the effect of an intervention on composite end points: should I dump this lump? EBM 10: 162, 2005
15. Ioannidis JP, Evans SJ, Gotzsche PC, O’Neill RT, Altman DG, Schulz K, CONSORT Group: Better reporting of harms in randomized trials: An extension of the CONSORT statement. Annals of internal medicine 141: 781 – 788, 2004.
16. Ethgen M, Boutron I, Baron G, Ethgen M: Reporting of harm in randomized controlled trials of nonpharmacologic treatment for rheumatic d isease. Ann Intern Med 143: 20 – 25, 2005.
17. Turlik, M. Introduction To Evidence-based Medicine. Foot and Ankle Online Journal. 2:4,2009
18. Weintraub MI, Wolfe GI, Barohn RA, Cole SP, Parry GJ, Hayat G, Cohen JA, Page JC, Bromberg MB, Schwartz SL, Magnetic Research Group: Static magnetic field therapy for symptomatic diabetic neuropathy: a randomised, double-blind, placebo-controlled trial. Arch Phys Med Rehabil 84: 736 – 746, 2003.
19. Turlik M. Evaluating the internal validity of a randomized controlled trial. Foot and Ankle Online Journal. 2: 5, 2009.


Address correspondence to: Michael Turlik, DPM
Email: mat@evidencebasedpodiatricmedicine.com

1 Private practice, Macedonia, Ohio.

© The Foot and Ankle Online Journal, 2009

How to Interpret the Results of a Randomized Controlled Trial

by Michael Turlik, DPM1

The Foot and Ankle Online Journal 2 (4): 4

This is the third in a series of articles explaining basic concepts of evidence-based medicine to podiatric physicians. This article will explain how to evaluate the results section of a randomized controlled trial. This will be illustrated using actual examples from the foot and ankle literature to include an article which was referenced in the first article of this series.

Key words: Evidence-based medicine, randomized controlled trial

This is an Open Access article distributed under the terms of the Creative Commons Attribution License.  It permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. ©The Foot & Ankle Journal (www.faoj.org)

Accepted: March, 2009
Published: April, 2009

ISSN 1941-6806
doi: 10.3827/faoj.2009.0204.0004


The preceding article in the series dealt with the evaluation of internal validity of a randomized controlled trial (RCT). [1] The first step in critically analyzing a randomized controlled trial is to assess its internal validity. Once you feel comfortable that the authors have taken sufficient methods to minimize bias you will need to determine what the results of the study are. The authors should present the results of the study so that the reader can easily determine whether the intervention is effective, independent of the author’s interpretations. This is the third in a series of articles introducing podiatric physicians to evidence-based medicine (EBM).

Primary Outcome

The authors should specify a single primary outcome which is patient oriented and was chosen a priori to determine effectiveness/efficacy of the intervention. It has been reported that trial outcomes may be selectively reported post hoc [2], this is considered outcome reporting bias. One of the purposes of a trial registration is to prevent post hoc changes in the primary outcome [3] and minimize outcome reporting bias.

Hard outcomes such as ulcer closure or amputation are less common in the podiatric literature than soft outcomes such as functional improvement or pain relief. Hard outcomes usually occur or they do not occur and are considered binary or dichotomous data. Soft outcomes usually are treated as continuous measures even though they may be categorical. When measuring soft outcomes the authors should provide a reference which validates the instrument used and provide a clinically important difference.

In one review of orthopedic publications, the authors found that 31% of the publications modified a validated outcome instrument and then used the instrument without re-validating it. [4] Patient reported outcomes which are not validated may produce data which is biased (measurement bias). The primary outcome should be used in the calculation of the sample size to insure adequate power to detect a clinically important difference. Articles which report multiple primary outcomes should be viewed with caution because, of the increased incidence of type I errors due to multiple hypothesis testing [5] and lack of power. Surrogate outcomes [6], physician centered outcomes [7] and composite outcomes [8] should be considered with care. The practice of EBM stresses patient centered outcomes.

Secondary outcomes should be reported in the same manner as the primary outcome. They help to explain in a positive trial the intervention used. If the primary outcome is not statistically and clinically significant then the secondary outcomes are best utilized to develop further research questions. Subgroup analysis should be viewed with caution due to type I errors and should be considered hypothesis generating rather than hypothesis confirming. [9]

Magnitude and precision of treatment effect

The purpose of reviewing the results section of a RCT is to determine if the difference between the intervention and comparator is due to chance and if statistically significant is the result clinically important.
Large studies may produce statistically significant results which are so small as to be clinically insignificant. The authors should report for any outcome a magnitude of effect with 95% confidence limits.10 In the case of binary outcomes results should be presented as a 2 X 2 table, risks or (less desirable) odds. Using the information the reader can calculate the number needed to treat (NNT) with 95% CI if not reported. For continuous measures the author should report the difference between the intervention and comparator as a point estimate with 95% confidence intervals. This data is best presented in a table rather than text. When the authors present data other than described above it will be difficult to understand and place in a clinical context. Commonly a chi-squared test will be used to assess statistical significance for binary outcomes. For continuous measures common significance tests are t-test, ANOVA or ANCOVA. [11] If unusual statistical tests are employed the author should provide appropriate references for review.

Evaluation of soft outcome

Landorf [12] in assessing the effectiveness of three different types of foot orthotic devices in the treatment of mechanically induced heel pain chose The Foot Health Status Questionnaire for the primary outcome. The Foot Health Status Questionnaire is a validated instrument which is patient oriented and measures both pain and function. The outcome was selected a priori and analyzed using ANCOVA. The results of the primary outcome were reported in a table of the article, a portion of which are summarized. (Table 1)

Table 1 Comparison of foot orthotics.

The largest difference in pain at three months (Table 1) was shown between the prefabricated and sham device, the smallest difference was shown between the prefabricated and custom device. During the calculation for the sample size a 15 point improvement in pain was used. If we accept this as being the clinically significant improvement the authors were looking for none of the point estimates were clinically significant. In addition, since the lower end of the 95% confidence interval was below zero for all of the comparisons none of the results were statistically significant either. A question which should always be asked in a negative trial is, was there potentially a clinically significant outcome possible if more participants were enrolled in the study? This is evaluated by looking at the upper end of the 95% confidence interval to see if it is at or above a clinically significant level, in this case 15 points. If it is it indicates that if more participants were enrolled in the study this difference may become apparent. Since the upper end of the prefabricated versus custom foot orthotic device is below 15 points we can say with confidence that there is no difference between prefabricated and custom foot orthotic devices in the treatment of mechanically induced heel pain based upon the results of this study.

Evaluation of hard outcome

Veves [13] evaluated the use of Graftskin in the healing of diabetic non-infected neuropathic foot ulcers. The primary outcome was complete wound closure at 12 weeks.

The authors reported in the text that 63 of the 112 patients receiving graft skin healed by 12 weeks whereas, 36 of the 96 control patients healed by 12 weeks. They related that this was a statistically significant result (p = 0.0042). Using the data provided in the article a 2 x 2 table could be constructed. (Table 2)

Table 2 Comparison of ulcer resolution.

Control Event Rate (CER) = 62.5%, 

Experimental Event Rate (EER) = 43.75%, CER-EER = Absolute Risk Reduction (ARR) = 18.75, Number Needed to Treat (NNT) = 1/ARR = 6 (3-19)

Using an online calculator [14] and the information from the article the Number Needed to Treat (NNT) with 95% confidence intervals can be calculated. NNT is defined as the number of patients you need to treat to prevent one bad outcome. [15] NNT allows the results of the study to be placed in a clinician friendly metric which can be evaluated easily. The lower the number is the more effective the treatment. The number needed to treat is the inverse of absolute risk reduction (ARR). The absolute risk reduction equals control event rate (CER) minus experimental event rate (EER). Based upon the results (Table 2) of the study 6 people will need to be treated with Graftskin for 12 weeks in order to prevent one additional ulcer from failing to heal. The reader will need to decide if the costs and adverse effects justify the use of this product when compared to the control (external validity). Understand that this is only an estimate and that the number needed to treat may be as high as 19.

Magnets in the Treatment of Diabetic Neuropathy

In the first article of this series [16] a RCT [17] was found which evaluated the usefulness of static magnets for painful diabetic neuropathy. The second article [1] in the series evaluated the internal validity of this RCT. Using the information previously presented in this article the results of the article evaluating static magnets for the relief of painful diabetic neuropathy are summarized:

Primary outcome measure

The authors described four different primary outcome measures using two different outcome instruments: foot pain, sleep (Quality of Life [QOL]), burning pain, numbness and tingling (11 point Visual Analogue Scale [VAS]) which were analyzed once per month for four months. It wasn’t clear if they provided references related to validation of the outcome instruments. The authors describe a fairly complex method of statistical analysis of the extensive amount of data collected. The sample size calculation utilized a 17% difference between treatment arms for the calculation. It was unclear which of the primary outcomes were used for the 17% difference. Therefore, it is difficult to tell which of the outcome measures were adequately powered. It does not appear the trial was registered.

Magnitude and precision of treatment effect

The design of the study was a randomized controlled trial using static magnetic insoles compared with a placebo. The objective of the study as stated by the author’s was “to determine if constant wearing of multipolar, static magnetic (450G) shoe insoles can reduce neuropathic pain and quality of life (QOL) scores in symptomatic diabetic peripheral neuropathy (DPN)”. [17]

The author’s conclusion is “the present study provides convincing data confirming that the constant wearing of static, permanent, magnetic insoles produces statistically significant reduction of neuropathic pain”.

The authors presented their results for the primary outcomes in a table format. For each month and for each outcome measured they presented a mean and a standard deviation.

In the accompanying text the authors related that changes from baseline to various months were statistically significant for both magnets and control with all of the four primary outcomes at some point in time.

The authors did not present point estimates with 95% confidence intervals during any time period between magnets and placebo. It isn’t clear nor do they clearly state in the article that static magnetic insoles are more effective than placebo for the reduction of neuropathic pain seen in diabetic neuropathy. Are magnets more effective than placebo in the treatment of diabetic neuropathy?

Typically the authors at the conclusion of the trial using the original data would have used a statistical test to generate a p value and a point estimate with 95% confidence interval for a comparison between groups for the primary outcome. Since this data is not available to the reader the information in the article can be used to determine an effect size. Using the number of participants, means and standard deviations reported by the authors for the various outcome measures at four months an effect size between magnets and placebo can be calculated. Calculation of effect sizes is commonly seen in meta-analysis to determine treatment efficacy. There are different methods available to calculate an effect size, a common method to determine significance is to calculate an effect size as described by Cohen. [18] An effect size less than 0.2 is not consistent with a clinical/statistical significant effect. Greater than 0.2 is a small effect, greater than 0.5 is a moderate effect, greater than 0.8 is a large effect.

Although all of the measurements at four months (Table 3) for the various primary outcomes favor magnets, the only outcome at four months which shows a small clinically/statistically significant effect favoring magnetic insoles is numbness and tingling. The other three outcome measures are all less than 0.2 therefore, neither clinically nor statistically significant.

Table 3 Comparison using effect size.

*Effect Size (Cohen) = experimental group mean- control group mean / standard deviation of control group.

The upper limit of all of the primary outcomes is compatible with a small significant effect. Would this be apparent if more patients were enrolled? Was this study underpowered for any or all of the primary outcomes? Would the results of the study be different if the data were analyzed in an intention to treat analysis rather than using a per protocol analysis? In conclusion, based upon the data presented in the article it is unlikely that magnetic insoles have any effect in relieving symptomatic diabetic peripheral neuropathology.

References

1. Turlik M. How to Evaluate the Internal Validity of a Randomized Controlled Trial. The Foot and Ankle Journal, 2 (3) 5, 2009.
2. Chan A., et. al. Evaluating the Internal Validity of a Randomized Controlled Trials. JAMA 291: 2457 – 2465, 2004.
3. Trial Registration. http://clinicaltrials.gov/ him accessed 1/23/09.
4. Poolman RW, Struijs, Krips R, Sierevelt IN, Marti RK, Farrokhyar F, Zlowodzki M, Bhandari M. Reporting of Outcomes in Orthopaedic Randomized Trials:
Does Blinding of Outcome Assessors Matter? J Bone Joint Surg 89A: 550, 2007.
5. Austin P. Testing multiple statistical hypotheses resulted in spurious associations: a study of astrological signs and health. Journal of Clinical Epidemiology 59: 964, 2006
6. D’Agostino R. Debate: The slippery slope of surrogate outcomes. Curr Control Trials Cardiovasc Med 1: 76 – 78, 2000.
7. Martin R. Current Concepts Review: Foot and Ankle Outcome Instruments. Foot & Ankle International 27: 383 – 390, 2006
8. Montoti V. How should clinicians interpret results reflecting the effect of an intervention on composite end points: should I dump this lump? EBM 10: 162, 2005
9. Wang R, Lagakos SW, Ware JH, Hunter DJ, Drazen JM. Statistics in Medicine — Reporting of Subgroup Analyses in Clinical Trials. NEJM 357: 2189 – 2194, 2007.
10. Consort Statement. http://www.consort-statement.org/ accessed 1/23/09
11. Vickers AJ, Altman DG. Analysing controlled trials with baseline and follow up measurements. BMJ 323: 1123 – 1124, 2001.
12. Landorf KB, Keenan A-M., Herbert A. Effectiveness of Foot Orthoses to Treat Plantar Fasciitis. Archives internal medicine 166: 1305 – 1310, 2006.
13. Veves A, Armstrong DG, Sabolinski ML, and the Apligraf Diabetic Foot Ulcer Study. Graftskin, a Human Skin Equivalent, Is Effective in the Management of Noninfected Neuropathic Diabetic Foot Ulcers. Diabetes Care 24: 290 – 295, 2001.
14. Number needed to treat calculator http://www.graphpad.com/quickcalcs/NNT2.cfm accessed 1/23/09
15. McAlister F. The “number needed to treat” turns 20 — and continues to be used and misused. CMAJ 179: 549 – 553, 2008.
16. Turlik M. Introduction to Evidence-based Medicine. The Foot and Ankle Journal, 2 (2), 4, 2009.
17. Wientraub M, Wolfe GI, Barohn RA. Static Magnetic Field Therapy for Symptomatic Diabetic Neuropathy: A Randomized, Double-Blind, Placebo-Controlled Trial. Arch Phys Med Rehabil 84: 727, 2003.
18. Effect Size calculator http://davidmlane.com/hyperstat/effect_size.html accessed 1/23/09.


Address correspondence to: Michael Turlik, DPM
Email: mat@evidencebasedpodiatricmedicine.com

Private practice, Macedonia, Ohio.

© The Foot and Ankle Online Journal, 2009

Introduction to Evidence Based Medicine

by Michael Turlik, DPM1

The Foot & Ankle Journal 2 (2): 4

This paper provides an introduction to evidence-based medicine for physicians. Evidence-based medicine is defined, a historical perspective is presented and the levels of evidence are explained. Using a clinical scenario a foreground question is developed and a MEDLINE search is performed to locate the best evidence pertaining to the foreground question. A series of articles will be presented in future issues of this journal critically evaluating selected publications identified during the MEDLINE search.

Key words: EBM, evidence-based medicine, foreground question, PICO method.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License.  It permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. ©The Foot & Ankle Journal (www.faoj.org)

Accepted: January, 2009
Published: February, 2009

ISSN 1941-6806
doi: 10.3827/faoj.2009.0202.0004

Introduction

I was fortunate enough to recently attend an alternative medicine seminar. One of the speakers was a PhD advocating the use of magnets in the treatment of various medical conditions. In an articulate and professional manner, she explained the pathophysiology of the disease with reference to electromagnetism and provided a logical explanation of why magnets should work. Specifically she advocated using magnetic insoles for the symptomatic relief of diabetic neuropathy. She presented a convincing series of cases involving patients with diabetic neuropathy who responded to the use of magnetic insoles. Over 90% of the patients related symptomatic relief of their painful disability.

In addition, she stated that she has patented a new type of magnet which is superior to older versions of therapeutic magnets. This new “polarizing” magnet was on sale in the lobby at the conclusion of the talk for those practitioners interested in dispensing them to their patients. The talk met with universal acceptance by the practitioners in attendance, many of whom remarked that they have been using “polarizing” magnets in their practices for years with “good results”. It is estimated that the general public spends more than $1 billion a year purchasing magnets for the treatment of disease. [1] Despite the speakers’ professionalism and the audience’s response, I remained somewhat skeptical about the use of magnets in the treatment of diabetic neuropathy.

This skepticism is based on a number of questions and issues. For example, how effective are the various treatment plans which we use on a daily basis for our patients? Even more confusing might be the use of diagnostic studies in the workup of our patients, when are they indicated and how well do they help us in making a diagnosis? Answers to these and other healthcare questions require information. Information regarding patient care comes from various sources to include: anecdotal information from experts, textbooks, internet searches, continuing medical education, published articles, drug and device manufacturing representatives. How to place this information or evidence in a clinical context is what a large part of evidence-based medicine is about.

Evidence-based medicine

Evidence-based medicine can be defined as the conscientious, explicit, judicious use of the current best evidence combined with clinical expertise and patient values in making decisions about the care of individual patients. [2] It is a structured approach to literature evaluation, leading to decisions which are based upon probability. The true effect of an intervention is never known for certain and we can only estimate where the truth may lie. In other words, the better the study, the closer we get to the truth.

In order to utilize evidence-based medicine in clinical practice it requires a paradigm shift from unsystematic, anecdotal observations, common sense, and pathophysiological rationales to the critical evaluation of medical information. In order to accomplish this evidence-based medicine requires the development of a new set of skills including foreground question generation, efficient literature searching resulting in the retrieval of the best evidence and the application of formal rules of evidence in evaluating the study methods to ascertain the validity of results.

Once valid and relevant clinical studies have been obtained, evidence-based medicine then requires the communication of the benefits, harms, costs, and inconveniences to patients in order to make decisions for individual patients based upon their perspectives, beliefs, expectations and values.(Fig. 1)

Figure 1 Step development in evidence based medicine.  (*PICO = Patient/Population/Problem, Intervention/Exposure, Comparison and Outcome)

History of evidence-based medicine

Prior to the 1950s, health care decisions were based primarily on anecdotal information, pathophysiology and the expert opinions of leaders in the profession. Randomized controlled trials were introduced into medicine near the end of the 1950s. During the 1980s a Canadian physician, Dr. David Sackett, began to develop a system for the critical analysis of the medical literature. His innovations proved to be a seminal event in the development of evidence-based medicine. In the early part of the 1990s, Dr. Archie Cochrane, a British physician, began advocating the use of systematic reviews in medicine. The current Cochrane collaboration is his legacy. The actual term evidence-based medicine is credited to Dr. Gordon Guyatt from McMaster University whose pioneering work appeared in the Journal of the American Medical Association as a series of publications entitled the “ ‘ users’ guide series” in the 1990’s.

Levels of evidence for therapeutic interventions [3]

Evidence-based medicine seeks to prioritize information in a hierarchy of evidence by study design from the most biased to the least biased. For example, the use of magnetic insoles in the treatment of diabetic neuropathy would be classified as a therapeutic intervention. Therefore the highest study design would be randomized controlled trials, systematic reviews and meta-analysis. This would be considered level I evidence whereas, unsubstantiated anecdotal information would be considered level V or most biased therefore least useful. Evidence levels vary with the type of question to be evaluated. For example, therapy, prognosis, diagnosis questions all vary with regards to the best study design.

Level V

Unsystematic anecdotal information is considered the least valid form of evidence regarding therapeutic interventions. Anecdotal information is best used to develop hypotheses which are tested using more rigorous study designs. Although anecdotal information is extremely popular in the medical profession it is highly biased and impossible to verify therefore, it is of limited usefulness in determining treatment efficacy. The sources of anecdotal information may include any of the following: experts in the field, colleagues, drug and device manufacturers. Other examples of level V evidence include cadaver studies and animal studies.

Level IV

Case reports and series are extremely popular in the medical literature. [4] They are easy to report and inexpensive. They are usually retrospective but may be prospective. Again, case studies and case series are best utilized to develop hypotheses rather than test hypotheses. They lack a control group, and suffer from selection bias. It does not matter how many subjects were enrolled or how long the study was carried forward.

Case reports often overestimate treatment effects. For example, Mullen, et al., in a series of cases over eight years reported good results using Cimetidine for the treatment of plantar warts in children. However, when this treatment was reviewed in two randomized controlled trials, Cimetidine was shown to be ineffective. [5] If case-series are sufficiently large they may be very useful in documenting adverse effects of treatments which have been shown to be effective.

Level III

Case-control studies are a type of observational study in which the outcome has already occurred. These are retrospective studies using a comparator. These types of studies are best used to evaluate rare conditions. The results of these studies are usually expressed as odds ratios. It is difficult to find a good control group for these studies therefore; selection bias is always a concern. Recall bias is another common concern regarding case-control studies. These types of studies are usually not very common in the podiatric literature. [4]

Level II

Cohort studies are another type of observational comparative study which begins with an exposure rather than outcome. Usually these studies are prospective, but maybe retrospective. This type of study (prospective) tends to minimize recall bias. These studies are expensive, time-consuming and require careful attention to detail in evaluating the control group. Regardless of the care taken to choose the control group, selection bias is always a concern. Results typically are expressed as a relative risk rather than odds ratios. These types of studies are not common in the podiatric literature. [4] Cohort studies should not be confused with the term cohort. Cohort refers to a group of patients, cohort studies are a type of observational study using a comparator.

Level I

Randomized controlled trials unlike the preceding study designs are true controlled experiments. Two or more groups of subjects receive different interventions and are followed forward in time and at some point they are compared using an outcome. This type of study design minimizes selection bias however, despite the fact that the concept is simple they are difficult to perform correctly and incur considerable expense. Although not common in the podiatric literature [4,6], there has been an increase in the number of these types of clinical trials in medicine over the last 10 years. Although extremely common for non-surgical treatments only recently have they gained recognition and acceptance in surgical sub specialties. [7,8]

Systematic reviews are a special type of review article which can be considered level I evidence when randomized controlled trials are utilized. Systematic reviews, unlike textbooks or narrative reviews, require careful planning and methods which minimize bias and random error. Like any well performed study the methods are transparent and allow other researchers to replicate the results and reach similar conclusions. Meta-analysis is a specialized type of systematic review which pool data for a quantitative rather than a qualitative result. They also can be considered level I evidence if the study is limited to randomized controlled trials.

Even though randomized controlled trials are considered level I evidence, the believability of the results varies with the rigor with which the study was performed. In a recent meta-analysis performed by Thomson evaluating the use of extracorporeal shock wave therapy for plantar heel pain, he found that when the results of the two poorest quality randomized controlled trials were removed the results of the meta-analysis was no longer statistically significant. [9] Lesser levels of evidence or poorly designed and executed studies overestimate treatment effects.

Clinical Scenario: Magnets and diabetic neuropathy

If we follow the step develop in evidence based medicine: A foreground question [10] could include; “In diabetics with symptomatic distal sensory neuropathology are magnetic insoles effective in reducing the painful sensations of diabetic neuropathy in diabetic patients?” When searching for the best evidence the podiatric physician has a choice between pre-appraised research and original research.

Pre-appraised research is the wave of the future since busy clinicians have little time and may lack the skills necessary to critically evaluate the medical literature. The Cochrane collaboration [11] is an example of a pre-appraised, searchable database. There are many others available. The pre-appraised sites are usually by subscription and mostly pertain to primary care medicine.

MEDLINE is the classic searchable medical database for original research. Using the foreground question developed regarding magnets and diabetic neuropathy, a literature search was performed using MEDLINE limiting the results to randomized controlled trials, and meta-analysis in an attempt to locate potential sources of level I evidence. Using the keywords magnets, pain, diabetic neuropathy, feet, the Boolean operator AND various results were returned.(Table 1) One of the results included a meta-analysis of randomized controlled trials which evaluated the use of magnets in reducing pain. [1]

Table 1  Results of MEDLINE search

Reviewing the references from the meta-analysis a randomized controlled trial evaluating the use of magnetic insoles for the treatment of diabetic neuropathology [12] was found. It should be noted that level I evidence may not always be found. Relying on lesser levels of evidence reduces the inference drawn regarding the therapeutic option.

Once potential sources of level I evidence are located the physician needs to answer the three following questions to critically appraise the articles found:

1. Have the author(s) reasonably attempted to limit bias.? ( Internal validity)

2. What are the results?

3. Will the results of the study help me in caring for my patient? (External validity)

Future articles in this series will provide the answers to these questions using the references identified [1,12] in the MEDLINE search for magnets and diabetic neuropathy.

References

1. Pittler MH, Brown EM. Static magnets for reducing pain: systematic review and meta-analysis of randomized trials. CMAJ 177 (7): 736 – 742, 2007.
2. Sackett DL, Rosenberg WM, Gray JA, Haynes RB, Richardson WS. Evidence based medicine: what it is and what it isn’t. BMJ 312 (7023): 71 – 2, 1996.
3. Levels of Evidence, [Online-accessed 1/10/2009].
4. Turlik M, Kushner D. Levels of evidence of articles in podiatric medical journals. JAPMA 90 (6): 300 – 302, 2000.
5. Mullen BR, Guiliana JV, Nesheiwat F. Cimetidine as a first-line therapy for pedal verruca. JAPMA 95 (3): 229 – 234, 2005.
6. Turlik M, Kushner, D, Stock, D. Assessing the validity of published randomized controlled trials in podiatric medical journals. JAPMA 93 (5): 392 – 398, 2003.
7. Obremskey WT, Pappas N, Attallah-Wasif E, Tornetta P, Bhandari M. Level of evidence in orthopedic journals. . Journ Bone Joint Surg, 87A (12): 2632 – 2638, 2005.
8. MaierV. What the surgeon of tomorrow needs to know about evidence-based surgery. Arch Surg, 141: 317 – 323, 2006.
9. Thompson C, Crawford F, Murray GD. The effectiveness of extra corporeal shock wave therapy for plantar heel pain: a systematic review and meta-analysis. BMC Musculoskeletal Disorders 6 (19), 2005.
10. Weinfeild JM, Finkelstein K. How to answer your clinical questions more efficiently. [Online-accessed 1/10/09].
11. The Cochrane collaboration, [Online-accessed 1/10/09].
12. Weintraub MI, Wolfe GI, Barohn RA, Cole SP, Parry GJ, Hayat G, Cohen JA, Page JC, Bromberg MB, Schwartz SL, Magnetic Research Group. Static magnetic field therapy for symptomatic diabetic neuropathy: a randomised, double-blind, placebo-controlled trial. Arch Phys Med Rehab 84: 736 – 46, 2003.


Address correspondence to: Michael Turlik, DPM
Email: mat@evidencebasedpodiatricmedicine.com

1Private practice, Macedonia, Ohio.

© The Foot & Ankle Journal, 2009