Statistical Significance and p-Values

Q: Interprets failure to reject the null as proof that the null hypothesis is correct

When p ≥ α, you fail to reject the null — but that is not the same as proving the null is true. It means your data didn't provide strong enough evidence against it, which could be because the null really is true, or because your study lacked the power to detect a real effect. The MCAT expects you to use the phrase 'fail to reject' precisely, and to recognize that absence of evidence is not evidence of absence.

MCAT trap: Misinterprets the p-value as the probability that the null hypothesis is true. The p-value is the probability of observing data at least as extreme as the results obtained, assuming the null hypothesis is true.

Statistical significance and p-values are tested heavily on the MCAT — and the dominant misconception is that the p-value tells you how likely the null hypothesis is to be true. It does not. The p-value assumes the null is true and asks how surprising your data would be in that world: P(data this extreme | null is true). Flipping that to P(null is true | your data) is a logical error the MCAT directly tests. If p = 0.03, that is not a 3% chance the null is true — it is the probability of seeing your results if the null were true. Run your experiment, collect data, and if the p-value falls below your significance threshold (α, usually 0.05), you reject the null.

The MCAT tests this from multiple angles. At the definition level, it checks whether you know what a p-value actually represents. In passage-based questions, it hands you a results table and asks you to interpret which comparisons are significant — and then whether that significance actually matters clinically. These are different questions, and conflating them is one of the most common errors. You'll also see questions that require you to reason through the null vs. alternative hypothesis setup and decide what a given p-value tells you about rejecting or retaining the null.

The three big traps: thinking the p-value tells you the probability the null is true (it doesn't — it assumes the null is true and asks about your data), thinking a significant result means a meaningful one (a huge sample can make a 0.1 mmHg blood pressure difference 'significant'), and thinking that failing to reject the null proves there's no effect. The MCAT loves all three of these. Know the distinctions cold.

Common misconceptions

Common mistake

Wrong: The p-value is the probability that the null hypothesis is true.

Right: The p-value is the probability of observing data at least as extreme as the results obtained, assuming the null hypothesis is true.

The p-value does not tell you how likely the null hypothesis is to be true — it takes the null as a given and asks how surprising your data would be in that world. Think of it as a conditional probability: P(data this extreme | null is true). Flipping that to say it's P(null is true | your data) is a logical error the MCAT directly tests. The null's truth or falsity is never assigned a probability in classical hypothesis testing.

Common mistake

Wrong: A statistically significant result is clinically meaningful.

Right: Statistical significance indicates the result is unlikely due to chance, but clinical meaningfulness depends on effect size and context; large samples can make trivial differences statistically significant.

Statistical significance just means your result is unlikely to be due to chance — it says nothing about the size or real-world importance of the effect. With a large enough sample, even a trivially small difference (like a 0.2-point drop in a score) can produce p < 0.05. The MCAT will give you a 'significant' result and then ask whether you should actually care about it — that requires thinking about effect size and clinical context, not just the p-value.

Common mistake

Wrong: Failing to reject the null hypothesis proves that the null hypothesis is true.

Right: Failing to reject the null hypothesis means the data do not provide sufficient evidence against it; it does not prove the null is true.

When p ≥ α, you fail to reject the null — but that is not the same as proving the null is true. It means your data didn't provide strong enough evidence against it, which could be because the null really is true, or because your study lacked the power to detect a real effect. The MCAT expects you to use the phrase 'fail to reject' precisely, and to recognize that absence of evidence is not evidence of absence.

Guided session

Stuck on this? An AI tutor that probes your understanding and catches where your reasoning breaks.

Start a session →

Free Deck audit

Already run Anki? See if your deck covers this topic.

Upload your deck →

What the exam tests

Know the precise definition of a p-value: it's the probability of observing data at least as extreme as yours, given that the null hypothesis is true — not the probability that the null hypothesis is true.
Understand the logic of hypothesis testing: what the null and alternative hypotheses represent, and how comparing p to α (typically 0.05) determines whether you reject or fail to reject the null.
Apply hypothesis testing mechanically: given a test statistic or p-value, decide whether the result crosses the significance threshold and what conclusion that supports.
Interpret a data table from a research passage: identify which results are statistically significant and separately evaluate whether those results are clinically meaningful or practically important.

Can you avoid these mistakes?

A study compares a new drug to placebo and reports p = 0.03 for the primary outcome. A classmate says 'there's only a 3% chance the null hypothesis is true.' What's wrong with this interpretation, and what does p = 0.03 actually mean?

A clinical trial enrolls 50,000 patients and finds that the treatment group has a systolic blood pressure 0.8 mmHg lower than controls, with p = 0.001. Is this result statistically significant? Is it clinically meaningful? What's the distinction?

Researchers test a new cognitive training program and find no statistically significant difference in test scores (p = 0.21). A team member concludes the program has no effect. Is this conclusion valid? Why or why not?

You see a results table showing four comparisons. Two have p-values of 0.02 and 0.04; two have p-values of 0.08 and 0.31. Using α = 0.05, which comparisons allow you to reject the null hypothesis, and what does that rejection actually mean?

Statistical Significance and p-Values

Common misconceptions

What the exam tests

Can you avoid these mistakes?

Related topics