Common misconceptions

Common mistake
Wrong: Berkson bias occurs because hospitalized patients are sicker than the general population.
Right: Berkson bias occurs because having either the exposure or the disease independently increases the probability of hospitalization, creating a spurious association between them among hospital patients.
Berkson bias has nothing to do with disease severity — it's about differential hospitalization rates. Even mild exposures or non-severe diseases can independently increase the likelihood of someone being hospitalized, so among hospital patients, exposed people are overrepresented relative to the general population. This creates a spurious statistical association between the exposure and disease within the hospital sample that vanishes when you look at the full population. The fix is using population-based controls, not hospital controls.
Common mistake
Wrong: The healthy worker effect causes occupational studies to overestimate the harm of workplace exposures.
Right: The healthy worker effect causes occupational studies to underestimate harm because employed workers are healthier than the general population used as the comparison group.
The healthy worker effect actually makes workplace exposures look safer than they are — it biases toward the null or even toward apparent benefit. This happens because people who are employed are, by definition, healthy enough to work; the chronically ill and disabled are excluded from the workforce and end up in the general population comparison group. So when you compare workers to the general population, the workers look healthier even if their exposure is genuinely harmful, causing the study to underestimate occupational risk.
Common mistake
Wrong: Cross-sectional studies accurately capture the true relationship between exposure and disease in the population.
Right: Neyman bias occurs because cross-sectional studies miss fatal or rapidly resolving cases, enriching the sample with survivors and underestimating the true exposure-disease association.
Cross-sectional studies measure prevalence, not incidence — they can only enroll people who are currently alive and currently ill at the time of the survey. Anyone who died rapidly from the disease or recovered quickly before the study was conducted is invisible to the study. This survival enrichment means the sample is biased toward milder, longer-lasting cases, which can make a deadly or fast-resolving exposure look less dangerous than it actually is. Neyman bias is particularly important when studying acute or fatal conditions.
Common mistake
Wrong: Randomization prevents selection bias in observational studies.
Right: Randomization is only available in experimental designs; in observational studies, selection bias is mitigated by random sampling, matching, or using population-based controls.
Randomization is a tool for experimental studies (RCTs) that balances both known and unknown confounders by randomly assigning participants to groups — it doesn't apply to observational designs where exposure assignment isn't under the investigator's control. In observational studies, you mitigate selection bias through design choices: drawing controls from the same population as cases, using random sampling, matching on key variables, and minimizing loss to follow-up. Conflating randomization with selection bias prevention will cost you points on USMLE Step 1 questions about study design.
Free Deck audit

See if your Anki deck covers this topic.

Upload your deck →
Guided session

Stuck on this? An AI tutor that probes your understanding.

Start a session →

What the exam tests

  1. Recognize the definition of selection bias: it arises from systematic differences in how study participants are enrolled, retained, or lost — not from random error — and it distorts the measured association between exposure and outcome.
  2. Understand the Berkson bias mechanism: in hospital-based case-control studies, having either the exposure or the disease independently increases the chance of hospitalization, creating a fake association among hospital patients that doesn't exist in the general population.
  3. Know the healthy worker effect and its direction: occupational cohort studies underestimate the harm of workplace exposures because employed workers are a healthier-than-average subset of the population, making the comparison group (general population) look sicker by default.
  4. Identify Neyman (prevalence-incidence) bias in cross-sectional studies: because these studies capture only people who are alive and sick at a single point in time, they systematically miss fatal cases and rapidly resolving disease, leading to underestimation of the true exposure-disease association.
  5. Apply design-level strategies to reduce selection bias in observational studies: random sampling from the population, using population-based controls instead of hospital controls, and minimizing loss to follow-up are the key tools — randomization is not available outside experimental designs.

Can you avoid these mistakes?

A researcher conducts a hospital-based case-control study on the association between smoking and bladder cancer, using patients hospitalized for orthopedic injuries as controls. Smoking is common among orthopedic patients because it increases fall risk and bone fragility. What bias is introduced and in which direction does it distort the odds ratio?
A cohort study follows steel mill workers for 20 years and compares their mortality to the general US population. The study finds that steel workers have lower all-cause mortality than the general population despite known toxic exposures. What is the most likely explanation, and does this bias overestimate or underestimate the harm of the exposure?
A cross-sectional study surveys a community to estimate the prevalence of hepatitis C and its association with intravenous drug use. The study finds a weaker association than expected from prior cohort data. What type of bias could explain this discrepancy, and which types of patients would the cross-sectional design systematically miss?
You are designing a case-control study on the association between pesticide exposure and Parkinson's disease. List two specific design strategies you would use to minimize selection bias, and explain why randomization is not one of them.

Related topics

See how your Anki deck covers this topic.

Upload your deck for a free audit →