AA and AB Testing on Reducing Error Rate

I helped my friend build a Line chatbot before, and I put a feature in it. That feature is to help customer find out what waiting number they are so that they can know how much time they need to wait for the service. When the customers want to their waiting numbers, they just need to provide Either name OR Line ID. The example is like the following pictures.

Searching Number Feature

However, I find some situations that make customers not to find out their numbers:

some users provide Both name and Line ID, such as typing ChengChingLin ar851060

Case 1 Example

other users provide name or Line ID with telling what is it exactly, like Line ID: ar851060 or Name: ChengChingLin.

Case 2 Example

Both cases will be failed in my original searching number algorithm, so I need to fix it up. Also, I want to know whether the new algorithm works or not, so I set up a experiment to find out.

Experiment Setting

AA Testing

Metrics:

Error Rate: $\cfrac{\text{number of error occurs}}{\text{number of people using this feature}}$
Usage Rate: $\cfrac{\text{number of people using this feature}}{\text{number of people login this chatbot}}$

Grouping Methods:

Here, we only split two groups using user id from Line message. The user id is 'U' with 32 hexadecimal numbers (noted as 0-9 or a-f).

Hypothesis:

$H_0$ : There is no difference in error rate and usage rate between between different teams, since the only thing I do is to seperate people into two groups, and that's all.

Parameters:

$\alpha$: 0.05
Power: 0.8
Number of Samples: 60
Testing Method: Fisher Exact Test

Results:

Group	Number of Error	Searching Number	Number of using Chatbot
A	6	18	33
B	6	20	30
Total	12	38	63

Usage Rate Result

The p-value is 0.4402, so we do not reject $H_0$ at $\alpha=$ 0.05. It means that we do not have enough evidence to show that these two groups are different in usage rate.

Error Rate Result

The p-value is 1, so we do not reject $H_0$ at $\alpha=$ 0.05. It means that we do not have enough evidence to show that these two groups are different in error rate.

AB Testing:

Metrics:

Error Rate: $\cfrac{\text{number of error occurs}}{\text{number of people using this feature}}$
Usage Rate: $\cfrac{\text{number of people using this feature}}{\text{number of people login this chatbot}}$

Grouping Methods:

Here, we only split two groups using user id from Line message. The user id is 'U' with 32 hexadecimal numbers (noted as 0-9 or a-f).

Group O: still use original algorithm
Group A: use new algorithm

Hypothesis

usage rate

$H_0$ : There is no difference in usage rate.
$H_1$ : There is such difference in usage rate.

error rate

$H_0$ : The error rate in group O should be bigger or equal than group A.
$H_1$ : The error rate in group O should be less than group A.

expected

My expectation is maintain the same usage rate but error rate should reduce by new algorithm.

Parameters:

$\alpha$: 0.05
Power: 0.8
Number of Samples: 60
Testing Method: proportional t-test

Results:

Group	Number of Error	Searching Number	Number of using Chatbot
O	6	28	165
A	0	23	146
Total	6	51	311

Usage Rate Result

The p-value is 0.7725, so we do not reject $H_0$ at $\alpha=$ 0.05. It means that we do not have enough evidence to show that these two groups are different in usage rate.

Error Rate Result

The p-value is 0.009, so we reject $H_0$ at $\alpha=$ 0.05. It means that we do have enough evidence to show that error rate in group A is smaller than group O in error rate.

Results

The experiment results show that the new algorithm prevent users from seeing error message. The result of new algorithm can be seen as below.

New algorithm for case 1

New algorithm for case 2

AA and AB Testing on Reducing Error Rate