In the last installment of this series, we introduced the concept of critical appraisal of the statistical methods used in a paper. The statistical analysis in a study is often the final barrier between the study’s results and application of those results to patient care, so making sure that the findings have been properly evaluated is of obvious importance.
We have previously discussed P values and confidence intervals—two of the most common statistical outcomes upon which clinical decisions are based. In this segment, we will discuss several specific issues that can help a reader decide how much faith to place in a study’s results.
Test Assumptions
Statistical tests generally require that a variety of assumptions be satisfied for the test procedure to be valid. These assumptions vary from test to test, and unfortunately most computer packages do not ask users whether they want to examine these assumptions more closely. This is one of the dangers of “black box” analysis, when researchers with little statistical training run their data through a statistical package without fully understanding how the output is generated.
Many statistical tests are based on the theory of the bell curve, or normal distribution. These tests require a large enough sample size, usually at least 30 subjects per group and sometimes much greater, for this theory to hold. In addition, the data should not be skewed excessively. For example, consider a study comparing two treatments for mild pain for which scores on a continuous 0-10 visual analog scale are expected to be between 0 and 2. Because of the asymmetry of the data, an underlying bell curve isn’t likely to make much sense. Therefore, a two-sample t-test may not be appropriate for this study even with two large samples.
Another commonly violated assumption is that the two groups being compared may need to be independent. The simplest case occurs when the same subjects are measured before and after a procedure. A two-sample statistical test is not appropriate here because the two groups are actually the same, and therefore clearly not independent. In this case, a paired analysis is required. The issue of independence becomes more complicated when we consider tests of multiple variables that may be related to one another, or studies of effects over time. In these instances, additional expertise in selecting the correct analysis approach is usually needed.
The best way to ensure that these assumptions and the many others required for valid statistical testing are met is to plan your analyses with the help of a trained statistician. If this is not an option, it is incumbent upon the researcher to learn about these assumptions and evaluate their study to make sure the appropriate methods are applied.
Negative Study Results
A more straightforward issue concerns interpretation of negative study results. Most clinicians are familiar with statistical power: A small study may yield a negative finding because this is the correct result or because there is not enough power to discern a difference between the groups being tested. Often, the width of the confidence interval provides insight into this problem. If the confidence interval includes a difference that would be clinically meaningful, a negative study should be viewed skeptically. In such cases, a larger study or a meta-analysis may be needed to better address the question. If, on the other hand, the confidence interval suggests that no clinically relevant result is likely, the negative study finding becomes more compelling.