Running catboost on a large-ish dataset (~1M rows, 500 columns), I get: Training has stopped (degenerate solution on iteration 0, probably too small l2-regularization, try to increase it).
How do I guess what the l2 regularization value should be? Is it related to the mean values of y, number of variables, tree depth?
Thanks!
The value of the parameter is added to Leaf denominator for each leaf in all steps. Since it is added to denominator part, the higher l2_leaf_reg is the lower value the leaf will obtain. It is quite intuitive though, when you think how L2 Regularization is used in typical linear regression setting.
CatBoost can handle missing values internally. None values should be used for missing value representation. If the dataset is read from a file, missing values can be represented as strings like N/A, NAN, None, empty string and the like. Refer to the Missing values processing section for details.
Handling Categorical features automatically: We can use CatBoost without any explicit pre-processing to convert categories into numbers. CatBoost converts categorical values into numbers using various statistics on combinations of categorical features and combinations of categorical and numerical features.
I don't think you will find an exact answer to your question because each data-set is different one from another.
However, based on my experience values form a range between 2 and 30, is a good starting point.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With