A/B testing – and the more complex multivariate testing – is a tool that can be used to determine the effectiveness of design or implementation decisions.
It is a way to test a product or service by comparing it to a different version of the same product or service. You start out with a metric to test and then compare variations of the product to see which one has the best result against the chosen metric.
For example, if you want to test the effectiveness of a blue "Buy now" button or a green "Buy now" button, you can create variations of the page and split the visitors to your site so that half, group A, see a blue button and half, group B, see a green button. You then track how many people click each of the buttons and compare the results.
Multivariate testing is similar but allows for testing multiple changes at the same time. Continuing our example, you may want to see if styling "Buy now" as medium sized text or large text is better as well as testing the colours of the button. You now have four variations:
- blue medium text
- blue large text
- green medium text
- green large text
You split your traffic to your site so that you can compare the results of each variation and again analyse the results to see which variation performs best.
This is simplifying the approach, but should give you an idea of how A/B or multivariate testing can be used.
There is, however, a flaw in this approach: it assumes that everyone is the same. It selects based on which gets the best results against the chosen metric but without any consideration of the individuals whose responses are being tested.
Here's an example. You're running the same testing of different text sizes and colours for the "Buy now" button, but your audience is made of people who prefer to use either browser zoom or high contrast mode. We can make it simple and say that the audience is split equally between the two groups and that our current version of the button is also equally usable by each group. You find that increasing the green large text version is the best performing, doubling the click through rate compared to the original version. But without knowing anything about the individuals participating in the test we don't know what impact each variation has had; perhaps the number of people who used the button in the browser zoom group has quadrupled while the high contrast mode group has gone to zero. We've found the most effective variation for our combined audience at the cost of excluding fifty percent of them entirely.
This is where User Heroes from Ab11y comes in. By gathering anonymous data, both inferred from interactions and through direct questioning, we can determine the impact of each variation on a wide range of population groups based on the way they use different assistive technologies and settings, individually and in combination. With that data, you can determine which variation is the best performing for each group and ensure that you make a choice that allows everyone to use your site or application, excluding nobody.