remove the outliers - Winsorization


Posted by ar851060 on 2025-05-30

_b2193875-34f1-402e-8305-6d65acf78fda

What is Winsorization

Winsorization is a method to remove the outliers. The process is:

  • Set a threshold of outliers(e.g. data under 5% and higher than 95%)
  • Change the values of outliers into the threshold value.

That's it, it is a very simple method to deal with outliers.

Simulation

I randomly generate 100 data points from normal with mean 100 and standard deviation 10. Therefore, the mean should be 100 theorically. Also, I add three outliers in the data points. Now, the number of data points are 103.

win

Since there are some outliers, the mean becomes 102.11. Mean is outlier-sensitive, so we can use it to try the different method for outlier removing. Trimming method is to delete the outliers, right now we set the data below 5% and above 95% are outliers. Winsorization is to set those ouliters to 5% value or 95% value.

Method Mean Sample size
Trimming 99.74 91
Winsorization 99.63 103

In conclusion, we can see both mean are very close to 100.

Why we do not use trimming method?

If we can have the similar results by deleting all outliers, why we need Winsorization. The key difference between two methods is sample number.

Using Winsorization, we keep the same sample number as the original data, while using trimming, the sample number is less than original.

Why we care the sample number? right, we do not need to care about sample number most of the time. However, in hypothesis testing, if we decrease the sample number, the power will also decrease. To keep the same power, we need to find a method to deal with outliers in the same sample number.

_dd835e72-2aed-421b-a746-67a64746b80d


#Winsorization #ab testing #statistics









Related Posts

F2E合作社|圖片與寬高尺寸控制|Bootstrap 5網頁框架開發入門

F2E合作社|圖片與寬高尺寸控制|Bootstrap 5網頁框架開發入門

[ 學習筆記系列 ] 後端基礎 (全) - MySQL 語法、基礎 PHP 與 Session / Cookie

[ 學習筆記系列 ] 後端基礎 (全) - MySQL 語法、基礎 PHP 與 Session / Cookie

JS30 Day 22 筆記

JS30 Day 22 筆記


Comments