Python's Xgoost: ValueError('feature_names may not contain [, ] or

Question

Python's implementation of XGBClassifier does not accept the characters [, ] or <' as features names.

If that occurs, it raises the following:

ValueError('feature_names may not contain [, ] or <')

It would seem that the obvious solution would be to pass the equivalent numpy arrays, and get rid of the column names altogether, but if they haven't done it that must be for a reason.

What use does XGBoost have for the feature names, and what is the downside of simply passing it Numpy Arrays instead of Pandas DataFrames?

Abhimanu Kumar · Accepted Answer

I know it's late but writing this answer here for other folks who might face this. Here is what I found after facing this issue: This error typically happens if your column names have the symbols [ or ] or <. Here is an example:

import pandas as pd
import numpy as np
from xgboost.sklearn import XGBRegressor

# test input data with string, int, and symbol-included columns 
df = pd.DataFrame({'0': np.random.randint(0, 2, size=100),
                   '[test1]': np.random.uniform(0, 1, size=100),
                   'test2': np.random.uniform(0, 1, size=100),
                  3: np.random.uniform(0, 1, size=100)})

target = df.iloc[:, 0]
predictors = df.iloc[:, 1:]

# basic xgb model
xgb0 = XGBRegressor(objective= 'reg:linear')
xgb0.fit(predictors, target)

The code above will throw an error:

ValueError: feature_names may not contain [, ] or <

But if you remove those square brackets from '[test1]' then it works fine. Below is a generic way of removing [, ] or < from your column names:

import re
import pandas as pd
import numpy as np
from xgboost.sklearn import XGBRegressor
regex = re.compile(r"$$|$$|<", re.IGNORECASE)

# test input data with string, int, and symbol-included columns 
df = pd.DataFrame({'0': np.random.randint(0, 2, size=100),
                   '[test1]': np.random.uniform(0, 1, size=100),
                   'test2': np.random.uniform(0, 1, size=100),
                  3: np.random.uniform(0, 1, size=100)})

df.columns = [regex.sub("_", col) if any(x in str(col) for x in set(('[', ']', '<'))) else col for col in df.columns.values]

target = df.iloc[:, 0]
predictors = df.iloc[:, 1:]

# basic xgb model
xgb0 = XGBRegressor(objective= 'reg:linear')
xgb0.fit(predictors, target)

For more read this code line form xgboost core.py: xgboost/core.py. That's the check failing which the error is thrown.

Yaqi Li · Answer

This is another regex solution.

import re

regex = re.compile(r"$$|$$|<", re.IGNORECASE)

X_train.columns = [regex.sub("_", col) if any(x in str(col) for x in set(('[', ']', '<'))) else col for col in X_train.columns.values]

Python's Xgoost: ValueError('feature_names may not contain [, ] or <')

Tags:

python

pandas

numpy

scikit-learn

xgboost

sapo_cosmico

2 Answers

Abhimanu Kumar

This is another regex solution.

Yaqi Li

Recent Activity

Donate For Us

Python's Xgoost: ValueError('feature_names may not contain [, ] or <')

Tags:

python

pandas

numpy

scikit-learn

xgboost

sapo_cosmico

2 Answers

Abhimanu Kumar

This is another regex solution.

Yaqi Li

Related questions

Recent Activity

Donate For Us