Mobile A/B testing can be a powerful tool to improve your app. It compares two versions of an app and sees which one does better. The result is insightful data on which version performs better and a direct correlation to the reasons why.

Even as A/B testing becomes much more prolific in the mobile industry, many teams still aren’t sure exactly how to effectively implement it into their strategies. There are tons of guides out there about how to get started but don’t cover many pitfalls that can be easily avoided–especially for mobile. Below, we’ve provided 7 common mistakes and misunderstandings, as well as how to avoid them

1. Not tracking events throughout the conversion funnel

This is one of the easiest and most common mistakes teams are making with mobile A/B testing today. Oftentimes, teams will run tests focused only on increasing a single metric. While there’s nothing inherently wrong with this, they have to be sure that the change they’re making isn’t negatively impacting their most important KPIs, such as premium upsells or other metrics that affect the bottom line.

Let’s say for instance, that your team is trying to increase the number of users signing up for an app. They theorize that removing an email registration and using only Facebook/Twitter logins will increase the number of completed registrations overall since users don’t have to manually type out usernames and passwords. They track the number of users who registered on the variant with email, and without. After testing, they see that the overall number of registrations did in fact increase. The test is considered a success, and the team releases the change to all users.

The problem though is that the development team doesn’t know how it affects other important metrics such as engagement, retention, and conversions. Since they only tracked registrations, they don’t know how this change affects the rest of their app. What if users who sign in using Twitter are deleting the app soon after installation? What if users who sign up with Facebook are purchasing fewer premium features due to privacy concerns?

To help avoid this, all teams have to do is put simple checks in place. When running a mobile A/B test, be sure to track metrics further down the funnel that help visualize other sections of the funnel. This helps you get a better picture of what effects a change is having on user behavior throughout an app and avoid an easy mistake.

Don't Get Caught In These 6 Mobile A/B Testing Mistakes illustration 1

2. Stopping tests too early

Having access to (near) instant analytics is great. I love being able to pull up Google Analytics and see how traffic is driven to certain pages, as well as the overall behavior of users. However, that’s not necessarily a great thing when it comes to mobile A/B testing.

With testers eager to check in on results, they often stop tests far too early as soon as they see a significant difference between the variants. Don’t fall victim to this. Here’s the problem: Statistics are most accurate when they are given time and many data points. Many teams will run a test for a few days, constantly checking in on their dashboards to see progress. As soon as they get data that confirm their hypotheses, they stop the test.

This can result in false positives. Tests need time, and quite a few data points to be accurate. Imagine you flipped a coin 5 times and got all heads. Unlikely, but not unreasonable right? You might then falsely conclude that whenever you flip a coin, it’ll land on heads 100% of the time. If you flip a coin 1000 times, the chances of flipping all heads are much much smaller. It’s much more likely that you’ll be able to approximate the true probability of flipping a coin and landing on heads with more tries. The more data points you have the more accurate your results will be.

To help minimize false positives, it’s best to design an experiment to run until a predetermined number of conversions and amount of time passed have been reached. Otherwise, you greatly increase your chances of a false positive. You don’t want to base future decisions on faulty data because you stopped an experiment early.

So how long should you run an experiment for? It depends. To prevent a false negative (a Type II error), the best practice is to determine the minimum effect size that you care about and compute, based on the sample size (the number of new samples that come every day) and the certainty you want, how long to run the experiment for, before you start the experiment. 

3. Not creating a test hypothesis

An A/B test is most effective when it’s conducted in a scientific manner. Remember the scientific method taught in elementary school? You want to control extraneous variables, and isolate the changes between variants as much as possible. Most importantly, you want to create a hypothesis.

Our goal with A/B testing is to create a hypothesis about how a change will affect user behavior, then test in a controlled environment to determine causation. That’s why creating a hypothesis is so important. Using a hypothesis helps you decide what metrics to track, as well as what indicators you should be looking for to indicate a change in user behavior. Without it, you’re just throwing spaghetti at the wall to see what sticks, instead of gaining a deeper understanding of your users.

To create a good hypothesis, write down what metrics you believe will change and why. If you’re integrating an onboarding tutorial for a social app, you might hypothesize that adding one will decrease the bounce rate, and increase engagement metrics such as messages sent. Don’t skip this step!

4. Implementing changes from other apps’ test results

When reading about others’ A/B tests, it’s best to interpret the results with a grain of salt. What works for a competitor or similar app may not work for your own. Each app’s audience and functionality is unique, so assuming that your users will respond in the same way can be an understandable, but critical mistake.

One of our customers wanted to test a change similar to one of its competitors to see its effects on users. It is a simple and easy-to-use dating app that allows users to scroll through user “cards” and like or dislike other users. If both users like each other, they are connected and put in contact with one another.

The default version of the app had thumbs up and thumbs down icons for liking and disliking. The team wanted to test a change they believed would increase engagement by making the like and dislike buttons more empathetic. They saw that a similar application was using heart and x icons instead, so they believed that using similar icons would improve clicks, and created an A/B test to see.

Unexpectedly, the heart and x icons lowered clicks of the like button by 6.0% and clicks of the dislike button by 4.3%. These results were a complete surprise for the team who expected the A/B test to confirm their hypothesis. It seemed to make sense that a heart icon instead of a thumbs up would better represent the idea of finding love.

The customer’s team believes that the heart actually represented a level of commitment to the potential match that Asian users reacted negatively to. Clicking a heart symbolizes love for a stranger, while a thumbs-up icon just means you approve of the match.

Instead of copying other apps, use them for test ideas. Borrow ideas and take customer feedback to modify the test for your own app. Then, use A/B testing to validate those ideas and implement the winners.

Don't Get Caught In These 6 Mobile A/B Testing Mistakes illustration 2

5. Testing too many variables at once

A very common temptation is for teams to test multiple variables at once to speed up the testing process. Unfortunately, this almost always has the exact opposite effect.

The problem lies with user allocation. In an A/B test, you have to have enough participants to get a statistically significant result. If you test with more than one variable at a time, you’ll have exponentially more groups, based on all the different possible combinations. Tests will likely have to be run much longer in order to find statistical significance. It’ll take you a lot longer to even glean any interesting data from the test.

Instead of testing multiple variables at once, make only one change per test. It’ll take a much shorter amount of time, and give you valuable insight as to how a change is affecting user behavior. There’s a huge advantage to this: you’re able to take learnings from one test, and apply it to all future tests. By making small iterative changes through testing, you’ll gain further insights into your customers and be able to compound the results by using that data.

6. Giving up after a failed mobile A/B test

Not every test is going to give you great results to brag about. Mobile A/B testing isn’t a magic solution that spews out amazing statistics each time they’re run. Sometimes, you’ll only see marginal returns. Other times, you’ll see decreases in your key metrics. It doesn’t mean you’ve failed, it just means you need to take what you’ve learned to tweak the hypothesis.

If a change doesn’t give you the expected results, ask yourself and your team why, and then proceed accordingly. Even more importantly, learn from your mistakes. Oftentimes, our failures teach us much more than our successes. If a test hypothesis doesn’t play out as you expect, it may reveal some underlying assumptions you or your team are making.

One of our clients, a restaurant booking app, wanted to more prominently display deals from the restaurants. They tested out displaying the discounts next to search results and discovered that the change was actually decreasing the number of bookings, as well as decreasing user retention.

Through testing, they discovered something very important: Users trusted them to be unbiased when returning results. With the addition of promotions and discounts, users felt that the app was losing editorial integrity. The team took this insight back to the drawing board and used it to run another test that increased conversions by 28%.

While not each test will give you great results, a great benefit of running tests is that they’ll teach you about what works and what doesn’t and help you better understand your users.

Conclusion

While mobile A/B testing can be a powerful tool for app optimization, you want to make sure you and your team aren’t falling victim to these common mistakes. Now that you’re better informed, you can push forward with confidence and understand how to use A/B testing to optimize your app and delight your customers.