
Falsifiable Hypothesis Worksheet
A worksheet that takes you through each step of the process, clearly and easy to follow.
Download Your Falsifiable Hypothesis Worksheet
TL;DR
Before a single ad goes live or a single email gets written, identify what you're changing and what you're watching.
Your independent variable is the exact action you're taking. Not "improving social media," but something specific: running video ads on LinkedIn targeting nonprofit program directors.
Your dependent variable is the measurable outcome that shifts in response to that action.
Here's a real example. You notice that nonprofit program directors click your LinkedIn ads at a higher rate than other audiences. You want to test whether targeting that segment specifically outperforms the others.
Independent variable: targeting nonprofit program directors as a defined segment.
Dependent variable: a 15% increase in demo requests compared to the control group, measured over 30 days.
The more specific both variables are before you start, the more useful the data is when you finish. The tighter and more defined your audience segment, the easier it is to detect a real signal. A hypothesis tested against a broad, loosely defined group produces results that are harder to read.
Most teams skip this step. It's the most critical part of the process.
A failure threshold is the specific condition under which your hypothesis is disproved. It should be written before the campaign launches, not after you've seen the numbers.
It sounds like this: "If the demo request rate does not increase by at least 15% over the next 30 days compared to the control group, this hypothesis is disproven."
Think of it as a strategic circuit breaker. It keeps the goalpost fixed. It prevents the natural impulse to reinterpret disappointing data as a partial win. And it gives you something concrete to act on when the test ends.
You're not trying to prove your idea works. You're trying to reject the null hypothesis, the idea that your change had no effect at all. That's the more honest version of the question, and it's harder to talk yourself out of when the numbers don't cooperate.
Worth noting: a 15% lift is only meaningful if the test ran long enough and reached enough people to produce a reliable result. Ending a test early because it looks promising is just as misleading as moving the goalpost.
If the threshold isn't met, you have a clear path:
Each of those is a direction informed by data, not a guess. If you can't write the failure sentence before you launch, you're not ready to test yet.
Words like "engagement," "interest," and "visibility" feel meaningful. They're not measurable. And if it can't be measured, it can't be tested.
Vague: "Increase customer interest in our services."
Operational: "Increase Contact Us form submissions from nonprofit segment visitors by 20% over 60 days."
The operational version tells you what's being tracked, who it applies to, and when you'll know if it worked. That's the version worth building a campaign around.
Likes are a classic example of a number that moves without meaning much. They don't reliably connect to purchasing behavior or long-term business outcomes. When you're setting up a hypothesis, make sure the dependent variable connects to a real business result.
When data comes in and a result looks positive, slow down before calling it a win.
Ask what else could have produced it.
Seasonal shifts in your audience's behavior. Competitors running a similar campaign in the same window. Platform algorithm changes that affected reach. A news cycle that made your audience more or less receptive. Any of these can create false positives for a weak plan or false negatives for a strong one.
There's also a subtler problem: selection bias. If your LinkedIn ads primarily reach people who already follow you, a 15% lift might not be new growth. It might just be an echo. The control group and the challenger need to reach genuinely comparable audiences.
You don't need a perfectly isolated environment. But you do need a control group and a challenger running at the same time. That's what a well-structured A/B test does. When both groups face the same external conditions simultaneously, the noise cancels out.
A before-and-after comparison can't do that. Too much changes between periods. The only way to isolate what your change actually did is to run it in parallel.
Weak hypothesis: "Our new pitch is better." No floor. No timeline. No comparison point. Success criteria: more leads. That's not a test.
Strong hypothesis: "Because meeting-to-proposal rates have stalled, if we lead with the service design case study in our pitch deck, then conversions will increase."
Failure condition: "If conversion stays below a 5% baseline over 30 days, the case study was not the value driver."
Success criteria: a 10% conversion rate, a 5-point increase over baseline, relative to the control group, within 30 days.
The strong version doesn't guarantee a better result. It guarantees clarity. You'll know what worked, what didn't, and where to go next.
Look at your current marketing plan. Can you name the exact action you're taking, the exact outcome you're watching, and the specific number that would tell you the hypothesis failed? If you can't answer all three, that's where you start.