Skip to content
Journal ·

The Difference Between Correlation and Causation

Two variables that move together do not necessarily affect each other. This single distinction is responsible for a large fraction of statistical errors in everyday reasoning.

“Correlation does not imply causation” is one of the most cited statistical principles in popular discourse, and one of the most poorly applied. The principle is correct, but it is often invoked in ways that miss what it actually means.

What the principle says

If two variables move together — if X increases when Y increases — the correlation between them does not, by itself, tell us whether X causes Y, Y causes X, both are caused by some third factor Z, or the relationship is coincidental. Four possible explanations, only one of which is direct causation.

The famous example is ice cream sales and drowning deaths. Both rise in summer. There is a strong correlation. Neither causes the other; both are caused by a third factor (warm weather, which produces both more ice cream consumption and more swimming).

Common errors

Two errors are common in everyday reasoning about correlation and causation.

The first is to assume causation when only correlation has been demonstrated. “Studies show that people who eat breakfast are slimmer; therefore, breakfast keeps you slim.” The study may show the correlation. It does not, by itself, establish that breakfast does the work. The slimmer people might eat breakfast for reasons related to their general health habits, and those habits — not breakfast specifically — might be doing the causal work.

The second error is to dismiss any correlation as “just correlation” when in fact careful research has established the causal link. “Smoking is correlated with lung cancer, but correlation isn’t causation” is technically true and practically dangerous. Decades of evidence, including mechanistic studies of how smoking damages lung tissue, have established the causal link. Treating it as “just correlation” misuses the principle.

How to evaluate

When you encounter a claim about cause and effect, several questions are worth asking.

Is there a plausible mechanism? Can you describe how X would cause Y? If the proposed mechanism is implausible, you should be skeptical even of strong correlations.

Has the correlation been confirmed in controlled studies? Randomized controlled trials, where the proposed cause is deliberately varied and the effect measured, are the gold standard. They are not always possible, but where they have been done, they should weigh heavily.

Has the relationship survived attempts to find confounding variables? Good research tries to identify and rule out alternative explanations. If a correlation holds up after controlling for plausible confounders, the case for causation is stronger.

What does the broader pattern of evidence say? A single study showing a correlation is weaker evidence than dozens of studies showing the same correlation across different populations and methods.

Why it matters

Mistaken inferences about causation produce real harm. Medical decisions, policy decisions, and personal decisions are made on the basis of inferred causal relationships. When the inference is wrong, the decisions tend to be wrong.

The discipline is not to deny all causal claims based on correlation. It is to require, for each causal claim, that the case for causation be made — that the evidence go beyond the bare correlation to include mechanism, controlled studies where possible, ruling out of confounders, and broad pattern of supporting evidence.

Most well-established causal claims meet these tests. Most poorly-established ones do not. Learning to tell which is which is one of the more useful skills for navigating a world that produces statistical claims faster than they can be properly evaluated.

Leave a note

Your email address will not be published. Required fields are marked *

Hey! Visit the shop?