Effect Size and Clinical Significance
MCAT trap: Conflates statistical significance (p-value) with effect magnitude or clinical importance. P-value reflects the probability of the data given the null hypothesis, not the magnitude of the effect; a tiny p-value can accompany a trivially small effect size.
Effect size is a key concept the MCAT uses to test whether you conflate statistical significance with clinical importance — and the dominant misconception is exactly that: treating p < 0.05 as proof that an effect matters. A study with 50,000 participants can make a Cohen's d of 0.05 reach p < 0.0001 — an effect so small it's clinically meaningless. Effect size measures the magnitude of a difference or association independent of how many people were studied, and the MCAT specifically presents impressive p-values paired with tiny effect sizes to see if you notice the distinction. Statistical significance tells you whether an effect is real; effect size tells you whether it matters.
The two effect size measures that appear on the MCAT are Cohen's d and the correlation coefficient r. Cohen's d is the standardized mean difference between two groups — you divide the raw difference by the pooled standard deviation to get a unitless number you can interpret across studies. By convention: d ≈ 0.2 is small, d ≈ 0.5 is medium, d ≈ 0.8 is large. For r, the benchmarks shift slightly: 0.1 small, 0.3 medium, 0.5 large. These benchmarks are the kind of thing a passage might give you implicitly or explicitly, and you need to use them to judge whether a finding is 'clinically significant' — meaning it actually changes what a clinician would do.
What makes this tricky is that most students conflate the p-value with importance. If p = 0.0001, it feels like something big happened. But a study with 50,000 participants can detect a Cohen's d of 0.05 at p < 0.0001 — a difference so small it's clinically meaningless. The MCAT loves this exact scenario. You'll see a passage present an impressive p-value, and the right answer will require you to notice that the effect size is tiny or that the absolute difference between groups has no real-world consequence. Train yourself to always ask: statistically significant, yes — but is the effect actually large?
Common misconceptions
What the exam tests
- Know that effect size (Cohen's d, r) measures the magnitude of a difference or association and is calculated independently of sample size — it doesn't go up just because you recruited more participants.
- Understand why statistical significance and clinical significance are different things: a very large sample size gives high statistical power to detect even trivially small differences as 'significant,' so a low p-value alone doesn't tell you the finding matters in practice.
- In a passage that presents both p-values and effect sizes (or absolute differences), be able to identify whether a statistically significant result is also clinically meaningful — or whether it's a case of a large study finding a real but trivial effect.
- Apply standard benchmarks to interpret a reported Cohen's d or r value: for Cohen's d, small ≈ 0.2, medium ≈ 0.5, large ≈ 0.8; for r, small ≈ 0.1, medium ≈ 0.3, large ≈ 0.5.
Can you avoid these mistakes?
Related topics
See how your Anki deck covers this topic.
Upload your deck for a free audit →