Common misconceptions

Common mistake
Wrong: A Type II error is rejecting a true null hypothesis (false positive).
Right: A Type I error is rejecting a true null hypothesis (false positive, rate = α); a Type II error is failing to reject a false null hypothesis (false negative, rate = β).
Type I and Type II are easy to swap under pressure, but anchor them this way: Type I = you went too far (you rejected the null when you shouldn't have — a false alarm). Type II = you didn't go far enough (you failed to reject the null when you should have — a missed detection). A helpful mnemonic: Type I = 'crying wolf' (false positive); Type II = 'the boy who cried nothing' (false negative). The rate of Type I errors is controlled directly by your significance threshold α; the rate of Type II errors is β.
Common mistake
Wrong: Lowering the significance threshold (α) reduces both Type I and Type II error rates.
Right: Lowering α reduces the Type I error rate but increases the Type II error rate (and decreases power), creating a tradeoff between the two error types.
Lowering α (say, from 0.05 to 0.01) raises the bar for what counts as statistically significant, which directly reduces your false positive rate — but it simultaneously makes it harder to detect real effects, increasing β and decreasing power. Think of it as a security system: making it harder to trigger reduces false alarms (Type I), but also means more real threats slip through undetected (Type II). There is no free lunch — reducing one error type with this lever always costs you on the other.
Common mistake
Wrong: Increasing sample size increases power by reducing the Type I error rate.
Right: Increasing sample size increases power by reducing the Type II error rate (β), not by changing α.
Increasing sample size improves power specifically because it reduces β (the Type II error rate) — it makes your test more sensitive to real effects by reducing sampling variability. It does not change α, which is set by the researcher before data collection as the significance threshold. Confusing these suggests a misunderstanding of what α actually is: α is a decision rule you define in advance, not a quantity that gets recalculated as your sample grows.
Common mistake
Wrong: Power is determined solely by sample size.
Right: Power is determined by sample size, effect size, significance level (α), and population variance; a larger effect size increases power independently of sample size.
Sample size is one lever, but power depends on four factors: sample size (n), effect size (the magnitude of the true difference), significance level (α), and population variance. A large, obvious effect (big effect size) is easier to detect even with a modest sample — that's why early drug trials of highly effective treatments often show significance quickly. Conversely, a small effect buried in noisy data (high variance) requires a large sample to have adequate power. On the MCAT, if a passage describes a study that 'failed to reach significance,' consider whether the study was underpowered due to small n, small effect size, or high variance — not just small n.
Free Deck audit

See if your Anki deck covers this topic.

Upload your deck →
Guided session

Stuck on this? An AI tutor that probes your understanding.

Start a session →

What the exam tests

  1. Know the precise definitions: Type I error = false positive = rejecting a true null hypothesis, with rate equal to α; Type II error = false negative = failing to reject a false null hypothesis, with rate equal to β.
  2. Understand the power formula (Power = 1 − β) and be able to explain how increasing sample size, increasing effect size, raising α, or decreasing population variance each increase statistical power.
  3. Given a passage describing a study outcome or design change, identify whether the researcher committed a Type I or Type II error, and predict how a specific design modification would shift each error rate.
  4. Apply the relationship between sample size, effect size, and power qualitatively — for example, recognize that a small study detecting a subtle effect is underpowered and prone to Type II errors, or use a power table to identify adequate sample sizes.

Can you avoid these mistakes?

A clinical trial tests a new drug versus placebo. The trial concludes there is no significant difference, but the drug actually does work. What type of error occurred, and what single design change would most directly reduce the likelihood of this error recurring?
A researcher wants to reduce the false positive rate in her experiment, so she lowers her significance threshold from α = 0.05 to α = 0.01. What happens to her Type II error rate and statistical power as a result?
Two studies test the same hypothesis. Study A has n = 30 and a large effect size. Study B has n = 300 and a small effect size. Which study is likely to have higher power, and why can't you answer definitively without more information?
A passage describes a study that found a statistically significant effect of a supplement on blood pressure (p = 0.03, α = 0.05), but the true null hypothesis is actually true (the supplement does nothing). Name the error type, state its rate, and explain what it means in practical terms for this finding.

Related topics

See how your Anki deck covers this topic.

Upload your deck for a free audit →