Common misconceptions

Common mistake
Wrong: A smaller p-value means the effect is larger or more clinically important.
Right: P-value reflects the probability of the data given the null hypothesis, not the magnitude of the effect; a tiny p-value can accompany a trivially small effect size.
The p-value only tells you the probability of observing your data if the null hypothesis were true — it is a measure of evidence against the null, not a measure of how big the effect is. A p-value of 0.0001 and a p-value of 0.04 can both accompany a Cohen's d of 0.2 (a small effect); the difference is just sample size and statistical power. Always check the effect size separately from the p-value to assess importance.
Common mistake
Wrong: A statistically significant result from a very large study must reflect a meaningful real-world difference.
Right: Large sample sizes give high power to detect even trivially small differences as statistically significant, so effect size must be evaluated separately from the p-value.
Statistical power increases with sample size, which means large studies can reliably detect effects that are real but tiny. A drug that lowers blood pressure by 1 mmHg on average might produce p < 0.001 in a trial of 100,000 patients — but a 1 mmHg reduction has no clinical relevance. The p-value confirms the effect is not zero; the effect size tells you whether it's worth caring about.
Common mistake
Wrong: Cohen's d changes as sample size increases, just like the p-value does.
Right: Cohen's d is a standardized measure of the difference between two means divided by the pooled standard deviation, and is independent of sample size.
Cohen's d is computed as the difference between two group means divided by the pooled standard deviation — neither of those quantities is directly determined by how many people you measured. Sample size affects the precision of your estimate of d and the p-value of your test, but the d value itself reflects the actual separation between groups relative to their variability, not how large your sample was. This is exactly what makes effect sizes useful for comparing findings across studies with different sample sizes.
Common mistake
Gap: Unaware of standard benchmarks for interpreting Cohen's d and r as small, medium, or large effects
By convention, Cohen's d ≈ 0.2 is small, ≈ 0.5 is medium, and ≈ 0.8 is large; the same benchmarks (0.1, 0.3, 0.5) apply to the correlation coefficient r.
Jacob Cohen established standard benchmarks so researchers could interpret effect sizes consistently: for Cohen's d, values near 0.2 are considered small, near 0.5 medium, and near 0.8 large. For the correlation coefficient r, the thresholds are lower: 0.1 small, 0.3 medium, 0.5 large. Knowing these lets you look at a reported effect size in a passage and immediately judge whether the finding has practical weight, rather than just noting that p < 0.05.
Free Deck audit

See if your Anki deck covers this topic.

Upload your deck →
Guided session

Stuck on this? An AI tutor that probes your understanding.

Start a session →

What the exam tests

  1. Know that effect size (Cohen's d, r) measures the magnitude of a difference or association and is calculated independently of sample size — it doesn't go up just because you recruited more participants.
  2. Understand why statistical significance and clinical significance are different things: a very large sample size gives high statistical power to detect even trivially small differences as 'significant,' so a low p-value alone doesn't tell you the finding matters in practice.
  3. In a passage that presents both p-values and effect sizes (or absolute differences), be able to identify whether a statistically significant result is also clinically meaningful — or whether it's a case of a large study finding a real but trivial effect.
  4. Apply standard benchmarks to interpret a reported Cohen's d or r value: for Cohen's d, small ≈ 0.2, medium ≈ 0.5, large ≈ 0.8; for r, small ≈ 0.1, medium ≈ 0.3, large ≈ 0.5.

Can you avoid these mistakes?

A pharmaceutical company runs a trial with 80,000 participants comparing a new antidepressant to placebo. The result is p < 0.0001, and the reported Cohen's d is 0.15. What should you conclude about the clinical significance of this finding, and why?
Two studies examine the same intervention. Study A (n = 50) finds p = 0.08 and Cohen's d = 0.6. Study B (n = 10,000) finds p = 0.001 and Cohen's d = 0.15. Which study provides stronger evidence that the intervention has a meaningful effect, and what is the key concept?
A passage reports a correlation of r = 0.28 between hours of sleep and exam performance, with p = 0.03. Is this a small, medium, or large effect? Is the finding statistically significant? Are those two judgments the same thing?
True or false: If you double the sample size of a study, the Cohen's d for the same underlying population difference will approximately double as well. Explain your reasoning.

Related topics

See how your Anki deck covers this topic.

Upload your deck for a free audit →