I want to use LightGBM
to predict the tradeMoney
of house, but I get troubles when I have specified categorical_feature
in the lgb.Dataset
of LightGBM
.
I get data.dtypes
as follows:
type(train)
pandas.core.frame.DataFrame
train.dtypes
area float64
rentType object
houseFloor object
totalFloor int64
houseToward object
houseDecoration object
region object
plate object
buildYear int64
saleSecHouseNum int64
subwayStationNum int64
busStationNum int64
interSchoolNum int64
schoolNum int64
privateSchoolNum int64
hospitalNum int64
drugStoreNum int64
And I use LightGBM
to train it as follows:
categorical_feats = ['rentType', 'houseFloor', 'houseToward', 'houseDecoration', 'region', 'plate']
folds = KFold(n_splits=5, shuffle=True, random_state=2333)
oof_lgb = np.zeros(len(train))
predictions_lgb = np.zeros(len(test))
feature_importance_df = pd.DataFrame()
for fold_, (trn_idx, val_idx) in enumerate(folds.split(train.values, target.values)):
print("fold {}".format(fold_))
trn_data = lgb.Dataset(train.iloc[trn_idx], label=target.iloc[trn_idx], categorical_feature=categorical_feats)
val_data = lgb.Dataset(train.iloc[val_idx], label=target.iloc[val_idx], categorical_feature=categorical_feats)
num_round = 10000
clf = lgb.train(params, trn_data, num_round, valid_sets = [trn_data, val_data], verbose_eval=500, early_stopping_rounds = 200)
oof_lgb[val_idx] = clf.predict(train.iloc[val_idx], num_iteration=clf.best_iteration)
predictions_lgb += clf.predict(test, num_iteration=clf.best_iteration) / folds.n_splits
print("CV Score: {:<8.5f}".format(r2_score(target, oof_lgb)))
BUT it still gives such error messages even if I have specified the categorical_features
.
ValueError: DataFrame.dtypes for data must be int, float or bool. Did not expect the data types in fields rentType, houseFloor, houseToward, houseDecoration, region, plate
And here are the requirements:
LightGBM version: 2.2.3
Pandas version: 0.24.2
Python version: 3.6.8
|Anaconda, Inc.| (default, Feb 21 2019, 18:30:04) [MSC v.1916 64 bit (AMD64)]
Could anyone help me, please?
The problem is that lightgbm can handle only features, that are of category
type, not object
. Here the list of all possible categorical features is extracted. Such features are encoded into integers in the code. But nothing happens to object
s and thus lightgbm
complains, when it finds that not all features have been transformed into numbers.
So the solution is to do
for c in categorical_feats:
train[c] = train[c].astype('category')
before your CV loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With