I helped my friend build a Line chatbot before, and I put a feature in it. That feature is to help customer find out what waiting number they are so that they can know how much time they need to wait for the service. When the customers want to their waiting numbers, they just need to provide Either name OR Line ID. The example is like the following pictures.
However, I find some situations that make customers not to find out their numbers:
- some users provide Both name and Line ID, such as typing
ChengChingLin ar851060
- other users provide name or Line ID with telling what is it exactly, like
Line ID: ar851060
orName: ChengChingLin
.
Both cases will be failed in my original searching number algorithm, so I need to fix it up. Also, I want to know whether the new algorithm works or not, so I set up a experiment to find out.
Experiment Setting
AA Testing
Metrics:
- Error Rate: $\cfrac{\text{number of error occurs}}{\text{number of people using this feature}}$
- Usage Rate: $\cfrac{\text{number of people using this feature}}{\text{number of people login this chatbot}}$
Grouping Methods:
Here, we only split two groups using user id from Line message. The user id is 'U' with 32 hexadecimal numbers (noted as 0-9 or a-f).
Hypothesis:
$H_0$ : There is no difference in error rate and usage rate between between different teams, since the only thing I do is to seperate people into two groups, and that's all.
Parameters:
- $\alpha$: 0.05
- Power: 0.8
- Number of Samples: 60
- Testing Method: Fisher Exact Test
Results:
Group | Number of Error | Searching Number | Number of using Chatbot |
---|---|---|---|
A | 6 | 18 | 33 |
B | 6 | 20 | 30 |
Total | 12 | 38 | 63 |
Usage Rate Result
The p-value is 0.4402, so we do not reject $H_0$ at $\alpha=$ 0.05. It means that we do not have enough evidence to show that these two groups are different in usage rate.
Error Rate Result
The p-value is 1, so we do not reject $H_0$ at $\alpha=$ 0.05. It means that we do not have enough evidence to show that these two groups are different in error rate.
AB Testing:
Metrics:
- Error Rate: $\cfrac{\text{number of error occurs}}{\text{number of people using this feature}}$
- Usage Rate: $\cfrac{\text{number of people using this feature}}{\text{number of people login this chatbot}}$
Grouping Methods:
Here, we only split two groups using user id from Line message. The user id is 'U' with 32 hexadecimal numbers (noted as 0-9 or a-f).
- Group O: still use original algorithm
- Group A: use new algorithm
Hypothesis
usage rate
$H_0$ : There is no difference in usage rate.
$H_1$ : There is such difference in usage rate.
error rate
$H_0$ : The error rate in group O should be bigger or equal than group A.
$H_1$ : The error rate in group O should be less than group A.
expected
My expectation is maintain the same usage rate but error rate should reduce by new algorithm.
Parameters:
- $\alpha$: 0.05
- Power: 0.8
- Number of Samples: 60
- Testing Method: proportional t-test
Results:
Group | Number of Error | Searching Number | Number of using Chatbot |
---|---|---|---|
O | 6 | 28 | 165 |
A | 0 | 23 | 146 |
Total | 6 | 51 | 311 |
Usage Rate Result
The p-value is 0.7725, so we do not reject $H_0$ at $\alpha=$ 0.05. It means that we do not have enough evidence to show that these two groups are different in usage rate.
Error Rate Result
The p-value is 0.009, so we reject $H_0$ at $\alpha=$ 0.05. It means that we do have enough evidence to show that error rate in group A is smaller than group O in error rate.
Results
The experiment results show that the new algorithm prevent users from seeing error message. The result of new algorithm can be seen as below.