An interview between conversion strategy specialists Hugh Gage and Craig Sullivan
Structured testing to improve website effectiveness has gained a lot of traction among digital marketers in recent years. Its adoption has been helped by success stories of subtle changes such as Google adding $200m in revenue by testing 41 shades of blue on their ad links, or the $300 million button, or even the $500 million button.
These types of 'quick wins' have encouraged more structured programmes moving from simplistic conversion rate optimisation split test to a more compelling realm of driving incremental revenue and profit.
The examples of improvement above are perhaps too tantalising, since they suggest that it's easy to get started and get great results. For many, the real world isn't like that - you have to make the case to colleagues and tests may fail to give these rewards.
As part of writing the Smart Insights briefing on Conversion Rate Optimisation I asked different optimisation specialists for their recommendations on making optimisation and split testing a success. We got such great recommendations from Craig Sullivan of Optimal Visit that we thought it would be worth sharing the full interview. If you don't know Craig he is one of the UK’s leading figures web testing and optimisation - I recommend following him on @OptimiseOrDie.
1. In your experience what is the most effective method for obtaining buy-in on a split testing program?
It's a team sport, so involve everyone - even the most annoying people - it teaches everyone the futility of relying on their ego, opinion, assumptions or cherished notions.
AB testing is the killing field for these notions and you need to invite as many as possible. I've found that taking a visible PR approach works wonders.
Get people involved by printing large copies of the tests, putting them on the wall, inviting comment, guesses, predictions, debate. Put it on your intranet, your newsletter and best of all - get people to gamble.
Having a participative and collaborative approach to designing, running and then sharing test results is the key. This means managing the PR of testing all the way from inception and test ideas to the results.
2. What would you say is the single most common mistake people make when split testing?
The biggest mistake is in simply not understanding how long to run a test for and when you should stop it.
There are many problems with this area but one of the biggest is in using the 95 (or 99%) confidence value as a rule for stopping the test. It's a big mistake and very common - as tests will often reach statistical significance without fulfilling other important conditions - like testing with awareness of business and purchase cycles, getting enough of a representative sample to measure the difference and testing whole weeks, not partials.
These are all examples of common mistakes and are focused around test running time and declaration. The second most common problem is from tests that are broken or render badly in browsers and devices using the site - that's another topic in itself!
Although it might be almost impossible to choose, if you could only have one, which single source or tool would you select for gathering usable insight prior to developing your hypotheses, and why?
Interviewing people (customers, sales teams, customer service teams) : because although analytics data can tell me 'where' and 'how big' - if it's correctly configured - it won't tell me why.
Asking questions, surveying, usability testing, interviewing or engaging with customers is where to learn about how the customers headspace intersects with your product or service.
Everything of value is happening at the boundary of that interface so talking to real customers or the people that support them or try to understand them (at the coalface of the company) would be the one thing I would never go without.
3. What would you say is the minimum run-time requirement (to be sure of a robust result)?
I’m thinking in terms of time and number of good outcomes per variant but if there are other variables that you think should be considered then please mention those too.
I normally suggest at least 250 if not a 350 sample but this is a bad answer in many ways. If you're testing a blue button versus a slightly different shade, this won't be enough to detect the subtle effect. If the change is massive, the sample you need will be smaller to make that effect detectable. So basically, the sample you may need depends on the reaction that your test creative gets - and that's different for every test.
You also might get that minimum 250 sample in one day on your site (if you have high traffic) but this would be a mistake too - as you need to test business cycles. If your weekend traffic is very different, for example, ending a test by excluding that segment would make your sample unrepresentative.
So - the minimum run requirement is for you to have a representative sample (a whole topic area) - test at least two business cycles (week/month), one whole purchase cycle minimum (or as much as you can get) and get clear separation of the results.
This is the +/- bit around the results of the two creatives. I don't rely on the confidence figure as it moves around too much and is not a good guide.
4. What advice would you give to people who want to supercharge their split testing program by running several tests concurrently but who are also worried about the risk of cross contaminating results?
Integrate with your analytics data and check the recipes. There's no problem running multiple tests as long as the recipe makes sense. If you're testing a 10% offer on the site with a 20% offer, it's going to be confusing if people see these two variants (ingredients) in their overall visit (recipe). What you need to do is check carefully that you don't have clashes - the offer or creative equivalent of Anchovy & Pineapple. As long as you aren't causing incongruous messages then you can unpick the results in your analytics data.
Let's imagine you have 3 tests running: The Homepage, The Product Page and the Basket Page. These tests are 1A, 1B, 2A, 2B, 3A and 3B.
When you integrate with your analytics package, you can then fire an event or pageview that signals a customer has been exposed to the experiment and which bucket they have been allocated to. When you analyse the test data later, you can then ask questions like this:
- What's the best combination of test results that drove conversion? - Ah, that would be 1A, 2B, 3B.
- What's the best combination for people who landed directly on a product page? Ah that would be 2A, 3B.
Being able to unpick the 'recipe level' conversion data as well as 'ingredient level' is pretty cool. You simply can't decode multiple tests unless your testing package supports multiple tests (not always easy to implement) or you integrate with analytics. Besides which, looking at analytics data is always richer than that given by the testing vendors. I'd also say it was more reliable too!
5. Where do you stand on micro testing vs big bold changes?
Micro testing is good to break the ice - to convince the boss or allow the team to build a simple test, as proof of concept. However, there is a big caveat - often, small changes (for example, on forms copy) can have huge impacts on conversion. The answer is 'it depends' - on the flow of money, conversion, loss on any site.
Without knowledge of the leak data and where you need to work on first, the style of testing isn't decided yet. You may have, like one client recently, a lead gen form that drove all their business. Small micro changes there or across a site-wide test can have huge impacts.
There is one bit of advice - micro testing is for me about the scale of the testing. Bold changes are about your approach to testing and the two aren't always the same. In order to measure any effect you are testing, making bold changes is more likely to 'get a reaction' - positive or negative - from the creative you are testing.
By being less cautious, you reduce the risk of tests with little shift in behaviour and weak statistical evidence to support that anything has happened. Be brave, be bold and be prepared to fail. If you're not failing, you are probably being too cautious or aren't testing enough.
6. What advice would you give organisations with extremely limited resources both financial and / human but who still want to embark on a split testing program?
With this question I’m trying to decouple best practice from real world pragmatism to understand if there really are any circumstances under which a split testing program should not be attempted.
If you haven't got enough traffic to test, you simply have to be honest with yourself - there aren't enough 'people here' to do anything with. And the secret to increasing that traffic probably isn't initially in simply running AB tests on stuff randomly. There are ALWAYS customers - understanding them, reaching out to them, surveying them, measuring them, comprehending them, empathising with their situation - is a limitless field of useful information for any site owner. I have advice for how to deal with low traffic sites in my slide decks (www.slideshare.net/sullivac) but if there is no way to test, then there isn't enough traffic to do much apart from try to understand how to grow it! Here's an example from last year's Smart Insights conference.
When you are tight for resource then I would recommend having people 'mentor' your testing programme. Hire someone to help you run things and then get some oversight and advice from an expert at regular review points.
Even if you can't afford a testing person, there are small agencies who can either run the entire testing programme for you or work with your team to transfer knowledge and solid practice on running things. Whether you want to build your own internal testing resource or augment the work of an existing team, there are solutions there that won't cost the earth.
I'll mention the traffic thing again - if you don't have enough to AB test - you have a different problem you need to focus on... The intersection of your product with the heads of customers.
My thanks to Craig for giving generously of his time and wisdom in answering these questions and adding to our guide!