A.I. & Optimization

Why A/B Testing is Falling Short for B2B?

If you have a B2B product, you know that the traffic to your website/app is an order of magnitude less than B2C counterparts (e.g. in order of 100's or 1000's daily visitors as opposed to millions). The low-traffic nature of B2B websites introudces several challenges to product/marketing development. Here, we discuss the challenges and try to propose alternative solutions to adress the issue.

Low Traffic & Statistical Significance

Let's say you have a coin and you want to compute what's the chance of seeing head if you flip the coin. You can take an empirical approach by flipping the coin enough number of times and recording the results. At the end, you compute the percentage of heads to estimate Pr(Head).

Since this is a random experiemnt, we need to have a tool by which we can measure the uncertainty of our calculation. The Law of large numbers and Central limit theorem provide a framework to formulate the uncertainty.

If you filp a coin N times and observe head S times, you estimate the head probabilty as follows:




Intuitively, one expects that the larger the N is more confident we will be in our estimation. By the Central Limit Theorem, when N is large enough, the probability distribution of the random variable p=S/N is approximated by a Normal distribution with mean p and variance pq/N where: q = 1 - p. Suppose we want to be 95% confident in our p estimation. One can show that for given S and N pair, the estimated p falls in the following range with 95% confidence:


$p&space;=&space;\frac{S}{N}\pm&space;2\sqrt{\frac{1}{N}\frac{S}{N}(1-\frac{S}{N})}$


Suppose in N(=1000) trials we had S(=590) successes. Then, p=S/N=590/1000=0.59 and the confidence interval for the estimated p is:


$p&space;=&space;0.59\pm&space;2\sqrt{\frac{0.59*0.41}{1000}}=0.59\pm&space;0.0311$


Thus, with 95% chance p falls in the following interval which is tight:


$p&space;\in&space;\[0.5589,0.6211]$


Now, for the second example let's assume you have two variations of landing pages on your website that you want to test in order to find out which version has a higher conversion rate. Let's assume that your website is getting 200 visitors per day. Thus, for the A/B testing, each LP's version gets 100 visitors per day in average. Let's say after one day, you get 3 conversions on the version A and 2 conversion on the second version. One can calculate the 95% confidence interval for version A as shown below:


$p&space;\in&space;\[-0.0041,0.0641]$


Similarly, the 95% confidence interval for the version B is shown below:


$p&space;\in&space;\[-0.008,0.048]$


You right away see the issue. The confidence intervals of two tests overlap and we clearly don't have a winning version due to low traffic. How many samples do you need to be confident in picking the best-converting LP? The rough answer is in order of thousands. For a low-traffic B2B website, this means we need to wait in order of days before being able to make any optimization conclusions. Note that for a proper A/B test you should change design or copy of one element of your LP at a time. Thus, testing a number of elements in a LP makes the test to last even longer!

Do we have any alternatives?

We face similar challenges when running advertising campaigns, optimizing landing pages or testing a new feature for a B2B product. However, there might be some alternatives to approach this low number issue. For one B2B client we needed to run many experiments with real users where we showed them new features for evaluation. We quickly noticed A/B testing is not an option when we're testing with a limited number of users.

We changed our experiments methodology to be more qualitative than quantitative. This means that we needed to ask thoughtful questions from users to capture their thinking model and emotions on presented features. We started capturing users inputs in more qualitative way by focusing on positive & negative feedbacks. Optimizing the product while focusing on elements that users liked/disliked help product deveopment move towards [global] optimal point.

Let us know if you have had similar experience during your product development cycles and how you approached it in the comment section.