Hypothesis Testing, Power, and Confidence Intervals
USMLE Step 1 trap: Misdefines p-value as the probability the null hypothesis is true rather than a conditional probability. The p-value is the probability of obtaining results at least as extreme as observed, assuming the null hypothesis is true.
Hypothesis testing, power, and confidence intervals are the statistical backbone of clinical research interpretation — and USMLE Step 1 tests them heavily because they show up in every research design question. You need to understand what null and alternative hypotheses are, how errors get made when testing them, and how to interpret the resulting p-values and confidence intervals. The exam hits this from multiple angles: pure definition recall (what is a Type II error?), application in passage context (a study reports a p-value of 0.08 with alpha set at 0.05 — what's the conclusion?), and interpretation of specific numbers (a 95% CI for RR is 0.85–1.20 — is this significant?).
The tricky part isn't memorizing the definitions — it's that several of these concepts have counterintuitive relationships that students consistently get backwards. The most dangerous misconception is about the p-value: students write that it's 'the probability the null is true,' but that's completely wrong. It's a conditional probability — the probability of seeing your data (or more extreme data) if the null were true. That distinction matters on vignettes that ask you to interpret what a p-value actually tells you. Similarly, students swap Type I and Type II errors under pressure, and they misapply the confidence interval null-crossing rule to ratio measures (checking for zero instead of one).
The alpha-beta-power-sample size tradeoff cluster is where USMLE Step 1 really separates students who understand the mechanics from those who memorized a table. Power goes up when n increases, effect size increases, variance decreases, or alpha increases. That last one surprises people — raising alpha (loosening your threshold to reject the null) makes it easier to reject, which increases power but also inflates false positives. Understanding why these relationships exist, not just what direction they go, is what lets you answer novel vignettes you've never seen before.
Common misconceptions
What the exam tests
- Given a study scenario, correctly identify what the null hypothesis states (no difference/no association) and what the alternative hypothesis states (a difference or association exists).
- Distinguish between Type I error (false positive: rejecting a true null hypothesis, probability = alpha) and Type II error (false negative: failing to reject a false null hypothesis, probability = beta) — and not swap them under pressure.
- State the correct definition of a p-value: the probability of obtaining results at least as extreme as observed, assuming the null hypothesis is true — not the probability that the null hypothesis is true.
- Define power (1 − beta) and identify which study design changes increase it: larger sample size, larger effect size, lower variability, or higher alpha.
- Explain the directional tradeoffs between alpha, beta, sample size, effect size, and power — for example, what happens to power and Type II error when you increase sample size.
- Correctly interpret a 95% confidence interval as a frequentist coverage statement: if the study were repeated many times, 95% of such intervals would contain the true population parameter.
- Apply the correct null-crossing rule to determine statistical significance from a CI: for ratio measures (RR, OR, HR), check whether the CI crosses 1; for difference measures (ARR, mean difference), check whether it crosses 0.
Can you avoid these mistakes?
Related topics
See how your Anki deck covers this topic.
Upload your deck for a free audit →