I have the following code:
most_important = features_importance_chi(importance_score_tresh,
df_user.drop(columns = 'CHURN'),churn)
X = df_user.drop(columns = 'CHURN')
churn[churn==2] = 1
y = churn
# handle undersample problem
X,y = handle_undersampe(X,y)
# train the model
X=X.loc[:,X.columns.isin(most_important)].values
y=y.values
parameters = {
'application': 'binary',
'objective': 'binary',
'metric': 'auc',
'is_unbalance': 'true',
'boosting': 'gbdt',
'num_leaves': 31,
'feature_fraction': 0.5,
'bagging_fraction': 0.5,
'bagging_freq': 20,
'learning_rate': 0.05,
'verbose': 0
}
# split data
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
train_data = lightgbm.Dataset(x_train, label=y_train)
test_data = lightgbm.Dataset(x_test, label=y_test)
model = lightgbm.train(parameters,
train_data,
valid_sets=[train_data, test_data],
**feature_name=most_important,**
num_boost_round=5000,
early_stopping_rounds=100)
and function which returns most_important parameter
def features_importance_chi(importance_score_tresh, X, Y):
model = ExtraTreesClassifier(n_estimators=10)
model.fit(X,Y.values.ravel())
feature_list = pd.Series(model.feature_importances_,
index=X.columns)
feature_list = feature_list[feature_list > importance_score_tresh]
feature_list = feature_list.index.values.tolist()
return feature_list
Funny thing is that this code in Spyder returns the following error
LightGBMError: Do not support special JSON characters in feature name.
but in jupyter works fine. I am able to print the list of most important features.
Any idea what could be the reason for this error?
You know what, this message is often found on LGBMClassifier () models, i.e. LGBM. Simply drop this line at the beginning as soon as you upload the data from the pandas and you have a problem with your head:
import re
df = df.rename(columns = lambda x:re.sub('[^A-Za-z0-9_]+', '', x))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With