remove the outliers - Winsorization


Posted by ar851060 on 2025-05-30

_b2193875-34f1-402e-8305-6d65acf78fda

What is Winsorization

Winsorization is a method to remove the outliers. The process is:

  • Set a threshold of outliers(e.g. data under 5% and higher than 95%)
  • Change the values of outliers into the threshold value.

That's it, it is a very simple method to deal with outliers.

Simulation

I randomly generate 100 data points from normal with mean 100 and standard deviation 10. Therefore, the mean should be 100 theorically. Also, I add three outliers in the data points. Now, the number of data points are 103.

win

Since there are some outliers, the mean becomes 102.11. Mean is outlier-sensitive, so we can use it to try the different method for outlier removing. Trimming method is to delete the outliers, right now we set the data below 5% and above 95% are outliers. Winsorization is to set those ouliters to 5% value or 95% value.

Method Mean Sample size
Trimming 99.74 91
Winsorization 99.63 103

In conclusion, we can see both mean are very close to 100.

Why we do not use trimming method?

If we can have the similar results by deleting all outliers, why we need Winsorization. The key difference between two methods is sample number.

Using Winsorization, we keep the same sample number as the original data, while using trimming, the sample number is less than original.

Why we care the sample number? right, we do not need to care about sample number most of the time. However, in hypothesis testing, if we decrease the sample number, the power will also decrease. To keep the same power, we need to find a method to deal with outliers in the same sample number.

_dd835e72-2aed-421b-a746-67a64746b80d


#Winsorization #ab testing #statistics









Related Posts

Day 75

Day 75

Ceres 函式庫簡介

Ceres 函式庫簡介

建立屬於你的 Google Map 地圖標記(三) - 地址輸入與座標取得

建立屬於你的 Google Map 地圖標記(三) - 地址輸入與座標取得


Comments