Question 1

Swaps the definitions of Type I and Type II errors

Accepted Answer

Type I and Type II are easy to swap under pressure, but anchor them this way: Type I = you went too far (you rejected the null when you shouldn't have — a false alarm). Type II = you didn't go far enough (you failed to reject the null when you should have — a missed detection). A helpful mnemonic: Type I = 'crying wolf' (false positive); Type II = 'the boy who cried nothing' (false negative). The rate of Type I errors is controlled directly by your significance threshold α; the rate of Type II errors is β.

Question 2

Believes reducing α simultaneously reduces both Type I and Type II errors

Accepted Answer

Lowering α (say, from 0.05 to 0.01) raises the bar for what counts as statistically significant, which directly reduces your false positive rate — but it simultaneously makes it harder to detect real effects, increasing β and decreasing power. Think of it as a security system: making it harder to trigger reduces false alarms (Type I), but also means more real threats slip through undetected (Type II). There is no free lunch — reducing one error type with this lever always costs you on the other.

Question 3

Attributes the power gain from larger samples to a reduction in Type I rather than Type II error

Accepted Answer

Increasing sample size improves power specifically because it reduces β (the Type II error rate) — it makes your test more sensitive to real effects by reducing sampling variability. It does not change α, which is set by the researcher before data collection as the significance threshold. Confusing these suggests a misunderstanding of what α actually is: α is a decision rule you define in advance, not a quantity that gets recalculated as your sample grows.

Question 4

Ignores effect size, α, and variance as determinants of statistical power

Accepted Answer

Sample size is one lever, but power depends on four factors: sample size (n), effect size (the magnitude of the true difference), significance level (α), and population variance. A large, obvious effect (big effect size) is easier to detect even with a modest sample — that's why early drug trials of highly effective treatments often show significance quickly. Conversely, a small effect buried in noisy data (high variance) requires a large sample to have adequate power. On the MCAT, if a passage describes a study that 'failed to reach significance,' consider whether the study was underpowered due to small n, small effect size, or high variance — not just small n.

Type I and Type II Errors; Statistical Power

Common misconceptions

What the exam tests

Can you avoid these mistakes?

Related topics