From my reading of the LightGBM document, one is supposed to define categorical features in the Dataset method. So I have the following code:
cats=['C1', 'C2']
d_train = lgb.Dataset(X, label=y, categorical_feature=cats)
However, I received the following error message:
/app/anaconda3/anaconda3/lib/python3.7/site-packages/lightgbm/basic.py:1243: UserWarning: Using categorical_feature in Dataset. warnings.warn('Using categorical_feature in Dataset.')
Why did I get the warning message?
LightGBM offers good accuracy with integer-encoded categorical features. LightGBM applies Fisher (1958) to find the optimal split over categories as described here. This often performs better than one-hot encoding. So we can assume that LightGBM does not one-hot encode these categorical features.
LightGBM allows us to specify directly categorical features and handles those internally in a smart way. We have to use categorical_features to specify the categorical features. Categorical features must be encoded as non-negative integers (int) less than Int32. MaxValue (2147483647).
Missing Value Handle LightGBM uses NA (NaN) to represent missing values by default. Change it to use zero by setting zero_as_missing=true . When zero_as_missing=false (default), the unrecorded values in sparse matrices (and LightSVM) are treated as zeros.
LightGBM can use categorical feature directly (without one-hot or label encoding). It has a unique way to deal with categorical variables. LGBM applies Fisher's method to find the optimal split over categories.
I presume that you get this warning in a call to lgb.train
. This function also has argument categorical_feature
, and its default value is 'auto'
, which means taking categorical columns from pandas.DataFrame
(documentation). The warning, which is emitted at this line, indicates that, despite lgb.train
has requested that categorical features be identified automatically, LightGBM will use the features specified in the dataset instead.
To avoid the warning, you can give the same argument categorical_feature
to both lgb.Dataset
and lgb.train
. Alternatively, you can construct the dataset with categorical_feature=None
and only specify the categorical features in lgb.train
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With