Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Catboost: what are reasonable values for l2_leaf_reg?

Running catboost on a large-ish dataset (~1M rows, 500 columns), I get: Training has stopped (degenerate solution on iteration 0, probably too small l2-regularization, try to increase it).

How do I guess what the l2 regularization value should be? Is it related to the mean values of y, number of variables, tree depth?

Thanks!

like image 792
Guy Adini Avatar asked Dec 09 '17 12:12

Guy Adini


People also ask

What is l2_leaf_reg?

The value of the parameter is added to Leaf denominator for each leaf in all steps. Since it is added to denominator part, the higher l2_leaf_reg is the lower value the leaf will obtain. It is quite intuitive though, when you think how L2 Regularization is used in typical linear regression setting.

Can CatBoost handle missing values?

CatBoost can handle missing values internally. None values should be used for missing value representation. If the dataset is read from a file, missing values can be represented as strings like N/A, NAN, None, empty string and the like. Refer to the Missing values processing section for details.

When should I boost my cat?

Handling Categorical features automatically: We can use CatBoost without any explicit pre-processing to convert categories into numbers. CatBoost converts categorical values into numbers using various statistics on combinations of categorical features and combinations of categorical and numerical features.


1 Answers

I don't think you will find an exact answer to your question because each data-set is different one from another.

However, based on my experience values form a range between 2 and 30, is a good starting point.

like image 109
Vadim Avatar answered Nov 18 '22 09:11

Vadim