In essence, data tells us what’s driving conversions, and what ain’t driving diddly squat.
But when you dive into your website analytics, how can you know the data you’re looking at is actually trustworthy?
Assuming you’ve got all your integrations set up properly and you haven’t been hacked by some Russian nerds, you can assume the data is probably correct.
So maybe the real question is: How do you know if your analytics data is actually worthwhile?
- Maybe the data says your sales have gone up 10% since you launched a new product page design, but does that mean your new product page is the reason for that boost?
- And if a 10% increase means 11 sales this month versus 10 sales last month, is that a valuable insight?
The answer to these questions lies in statistical significance.
What is Statistical Significance?
If you Google “What is statistical significance”, you’ll get all sorts of overly complicated answers.
But simply put, when data is statistically significant, it means you can have a high level of confidence that it means what you think it means. To be more specific, we usually talk about statistical significance when we talk about A/B testing, which is an increasingly common digital marketing trend.
In A/B testing, we release two versions of an email, ad, landing page, etc. to a sample audience, and use data to decide which one is most effective at reaching our goal.
We can use A/B testing with emails to see which headline gets the higher open rate, with ads to see which one generates more leads, or on a product landing page to test different images, layouts, and CTAs to see what drives more sales.
Let’s say we have two variations of the same email, and we want to test which version gets more opens. The only difference between the two is the headline, which is really all a recipient sees before they open the email. Invariably, one headline will attract more opens than the other (except for the unlikely occurrence that they get the exact same number of opens).
But is one headline really better than the other? How can we be confident that if we pick the headline that gets more opens, the data will hold when we send it out to our entire list?
In other words, how do we know that the data telling us one headline is better than the other is statistically significant?
There’s a formula for that.
How to Compute Statistical Significance
Statistical significance is represented as a percentage, from 0 to 100, which tells us how confident we can be that the data means what we think it means.
If the data from our email headline A/B test has a statistical significance of 100%, then it means we can be 100% confident that for example, Headline B is better than Headline A.
It’s like betting odds. If Horse X has run the same track 10 seconds faster than Horse Y in the last 100 races, we can be pretty confident that Horse A’s racing data is statistically significant, and tells us that Horse X is highly likely to beat Horse Y. Horse X might not always beat Horse Y, but most of the time, he will.
Calculating statistical significance requires two pieces of information about each variation you’re testing: a sample size, and a key performance indicator.
First, you want to look at the performance of each. Let’s make up some numbers about our email headlines that we’re A/B testing.
Number of Recipients: 100
Number of Opens: 24
Open Rate: 24%
Number of Recipients: 100
Number of Opens: 25
Open Rate: 25%
Headline’s B open rate of 25% is better than Headline A’s open rate of 24%. It’s about 4.2% better, in fact. We get that by dividing 25 by 24. Easy.
If you stopped there, you would say “Let’s go with Headline B!” End of story, right? Nope.
What we really need to know is if that improvement in open rate is statistically significant or not.
This is determined by taking into account our sample size. A small change is unlikely to be statistically significant if we’re looking at a relatively small sample size.
However, when we have a very large sample size, when overall variation in behavior is expected to be small, even minor KPI improvements should be paid attention to.
You’re probably thinking, “Okay, just tell me how to compute statistical significance already!” Here’s the thing: It’s really math-y, and we’re marketers, not mathematicians.
We could lay out the mathematical approach, which involves standard deviations, variances, z-indices, t-score tables, Chi-squared results, and more.
Or, we could save you the trouble by letting you know that someone has already made a free computer program to do all the math work for you.
If we put the above results into the statistical significance calculator, it comes back with a score of only 57%. Why?
Our sample size isn’t that big and the open rate improvement isn’t that great. So you probably don’t want to make a decision about one headline over the other with this data.
But what if we had a large audience to test these headlines with? Let’s say we re-run the same test but with two audiences of 10,000 recipients each. Each email gets the exact same open rate — Headline A gets 2400 opens or a 24% open rate, and Headline B gets 2500 opens, a 25% open rate.
On paper the results seem the same. But the audience size is much larger. And the actual number of people who opened one email over the other is larger.
In the first example, Headline B only attracted one more person than Headline A. Maybe that one person ate old cereal that day and it messed with their head and for whatever reason they just liked Headline B. Tomorrow, when they eat normal cereal, they could prefer Headline A.
In the second example with the larger sample size, though, Headline B attracted 100 more people. It’s unlikely they all ate old cereal that day, right?
When we put this new dataset into the statistical significance calculator, that intuition is shown to be true. Now the same variation in our KPI has a statistical significance of 95%.
That means that it is highly likely that Headline B is superior to Headline A, and will perform better when we send out our email to our full list.
How Important is Statistical Significance?
Most of us don’t have email lists so big that we can test a headline on 20,000 subscribers before sending out the official email. But whatever our sample size, statistical significance can be used to tell us how seriously we should take our data.
The general rule of thumb is you want a significance score of 95 or higher to make decisions based on A/B testing results.
But this is just a suggestion. A score of 90 is still pretty good. The more robust your sample set, the better. A very small dataset is generally not too helpful, regardless of what its significance score is.
The same goes for the timespan over which your data was collected. For example, you might not want to trust statistical significance calculated from data retrieved on one day.
Maybe there was a big football game that day that had fewer people opening their emails. Plus: never forget the potential impact of old cereal. Who knows? A dataset collected over two or more weeks is more likely to be trustworthy.
At the end of the day, it’s people who have to make decisions about what to do with data. Statistical significance is a helpful tool in guiding those decisions, and a big part of our optimization process here at BKMedia.
When we brought our data-driven approach to our client Steuben Press’ Google ads, we were able to generate huge improvements. A 1200% increase in leads, a 288% boost in their conversion rate, and a 45% decrease in the cost per lead. Dang!
We achieved that by taking statistical significance into consideration. But in order to generate meaningful data, we started with A/B testing. Read our guide on how to effectively execute A/B testing in digital marketing »