How XGBoost Handles Missing Values


Posted by ar851060 on 2024-06-15

If you are familiar with Machine Learning methods, you must head about the strong weapon in ML: XGBoost. It is a powerful waepon with lots of adventages and opitmization. In this article, I only talk about one adventage in XGBoost: It can deal with missing values naturally. But how?

Sparsity-Aware Split Finding

In the original paper, they came up with a brilliant idea called Sparsity-Aware Split Finding. The algorithm is belowed.

Algorithm

Let me explain how it works without using lots of mathematical symbols.

  1. When it needs to split, it will split data into two groups: data with missing values and data without missing values.
  2. Use data without missing values to find out the best threshold to cut.
  3. Try to put all data with missing values on one side, and calculate the Gain on the other side. The Gain of the side with missing values are calculated by parent Gain minus the Gain without missing values
  4. Calculate both side and find out which direction is the best way to put missing values.

Basically, XGBoost trys to put all data with missing values on either side, and find the maximum Gain in which direction of missing values put.


Ref


#xgboost #machine learning #Missing values









Related Posts

2.開始使用java 17前,先來了解一下我們業務邏輯的進入點

2.開始使用java 17前,先來了解一下我們業務邏輯的進入點

550. Game Play Analysis IV

550. Game Play Analysis IV

外出學習效果好的一天

外出學習效果好的一天


Comments