Case-Control Studies

USMLE Step 1 trap: Incorrectly calculates relative risk instead of odds ratio from case-control data. Because disease prevalence in the sample is set by the investigator, true disease incidence is unknown, so only the odds ratio can be calculated from case-control data.

Case-control studies are one of the highest-yield observational designs on USMLE Step 1, and they're also one of the most consistently misapplied. The core structure: you start with people who already have the disease (cases) and people who don't (controls), then look backward to compare how often each group was exposed to a risk factor. That backward directionality — outcome first, then exposure — is what defines the design and explains everything downstream, including why you can only calculate an odds ratio, not a relative risk.

The exam tests this concept from multiple angles. Straightforward recall questions ask you to identify the design from a vignette. Application questions give you a 2x2 table and ask which statistic to compute. Passage-based questions describe a scenario — say, a researcher studying a rare cancer or a disease with a 20-year latency — and ask you to choose the most appropriate study design. Each angle requires a different layer of understanding, and students who only memorize 'case-control = odds ratio' routinely get the application and scenario questions wrong.

What makes this tricky is that the misconceptions are intuitive. It feels like you should be able to calculate relative risk from any 2x2 table. And it seems like recall bias would affect everyone equally. USMLE Step 1 specifically exploits these intuitions. The fix is understanding the mechanism behind each rule, not just the rule itself.

From real student decks

54%have cards covering this topic

38%have mature cards

Common misconceptions

Common mistake

Wrong: Relative risk can be directly calculated from a case-control study.

Right: Because disease prevalence in the sample is set by the investigator, true disease incidence is unknown, so only the odds ratio can be calculated from case-control data.

Relative risk requires knowing disease incidence in exposed vs. unexposed populations, but in a case-control study, the investigator decides how many cases and controls to enroll — so the proportion of diseased people in the sample is artificial, not a reflection of true population prevalence. Since you can't recover true incidence from this setup, you can't calculate risk in each exposure group. The odds ratio is the correct measure here, and when disease is rare, it approximates the relative risk anyway.

Common mistake

Wrong: Case-control studies start with exposed and unexposed groups and follow them forward.

Right: Case-control studies start with diseased (cases) and non-diseased (controls) and look backward to compare past exposures.

Cohort studies go exposure → outcome (forward in time). Case-control studies go outcome → exposure (backward in time). This isn't just a semantic difference — it's what determines the study's feasibility, cost, and which statistics are valid. If a vignette says researchers identified people with a disease and then asked about their past exposures, that's case-control, full stop, regardless of what else the question implies.

Common mistake

Wrong: Recall bias affects cases and controls equally in case-control studies.

Right: Cases are more likely than controls to recall past exposures because their disease motivates them to search for causes, systematically inflating the apparent association.

Cases have a psychological incentive that controls don't: they're sick, so they've often spent time thinking about what caused their illness. This motivates more thorough — and sometimes inflated — recall of past exposures. Controls have no such motivation, so they may underreport the same exposures. The result is a systematic difference in reporting between the two groups, not random noise, which pushes the odds ratio away from 1 even when no true association exists.

Common mistake

Gap: Fails to recognize case-control as the design of choice for rare diseases or diseases with long latency

Case-control studies are the preferred design for rare diseases because they start by identifying existing cases rather than waiting for rare events to occur in a prospective cohort.

In a prospective cohort study, you'd need to enroll thousands of people and wait years just to accumulate a handful of cases of a rare disease — that's expensive and often impractical. Case-control studies sidestep this by starting with existing cases, so you can efficiently study rare outcomes or diseases with long latency without decades of follow-up. Anytime a Step 1 vignette describes a rare condition or asks which design requires the fewest participants to study an uncommon outcome, case-control is the answer.

Guided session

Stuck on this? An AI tutor that probes your understanding and catches where your reasoning breaks.

Start a session →

Free Deck audit

Already run Anki? See if your deck covers this topic.

Upload your deck →

What the exam tests

Identify the directionality of a case-control study: it starts with disease status (cases vs. controls) and looks backward at past exposure — not forward from exposure to outcome.
Explain why relative risk cannot be calculated from case-control data and why the odds ratio is the correct effect measure instead.
Recognize which research scenarios favor a case-control design, specifically rare diseases and conditions with long latency periods where prospective follow-up would be impractical.
Identify the biases that disproportionately threaten case-control studies, especially recall bias and selection bias, and understand the mechanism by which they distort results.

Can you avoid these mistakes?

A researcher wants to study the relationship between a rare form of childhood leukemia and prenatal pesticide exposure. She identifies 50 children with the disease and 200 healthy children, then interviews their mothers about pesticide use during pregnancy. What study design is this, what effect measure should she report, and why can't she report the other common effect measure?

You're given a 2x2 table from a case-control study: 40 cases exposed, 10 cases unexposed, 20 controls exposed, 30 controls unexposed. Calculate the odds ratio. Then explain why calculating the relative risk from this table would be incorrect.

A case-control study finds a strong association between a dietary supplement and liver disease. Critics argue the results are inflated by recall bias. Explain the mechanism: why would recall bias affect cases and controls differently, and in which direction would it push the odds ratio?

For each scenario below, decide whether a case-control or cohort study is more appropriate and justify your answer: (a) studying whether smoking causes lung cancer in a population of 10,000 adults over 20 years; (b) studying risk factors for a disease that affects 1 in 50,000 people per year.

Case-Control Studies

Common misconceptions

What the exam tests

Can you avoid these mistakes?

Related topics