how to keep column's names after one hot encoding sklearn?

Question

I am working on the titanic kaggle competition, to deal with categorical data I’ve splited the data into 2 sets: one for numerical variables and the other for categorical variables. After working with sklearn one hot encoding on the set with categorical variables I tried the regroup the two datasets but since the categorical set is an ndarray and the other one is a dataframe I used:

np.hstack((X_train_num, X_train_cat))

which works perfectly but I no longer have the names of my variables.

Is there another way to do this while maintaining the names of the variables without using pd.get_dummies()?

Thanks

np.hstack((X_train_num, X_train_cat))

which works perfectly but I no longer have the names of my variables.

Is there another way to do this while maintaining the names of the variables without using pd.get_dummies()?

Thanks

piRSquared · Accepted Answer

Try

X_train = X_train_num.join(
   pd.DataFrame(X_train_cat, X_train_num.index).add_prefix('cat_')
)

Ami Tavory · Answer

Well, as you stated in your question, there's pd.get_dummies, which I think is the best choice here. Having said that, you could use

pd.concat([X_train_num, pd.DataFrame(X_train_cat, index=X_train_num.index)], axis=1)

If you like, you could give also useful column names with

pd.concat([X_train_num, pd.DataFrame(X_train_cat, index=X_train_num.index, columns=cols)], axis=1)

and cols can be whatever list of strings you want (of the appropriate length).

how to keep column's names after one hot encoding sklearn?

Tags:

python

pandas

one-hot-encoding

scikit-learn

data-science

user2486276

2 Answers

piRSquared

Ami Tavory

Recent Activity

Donate For Us

how to keep column's names after one hot encoding sklearn?

Tags:

python

pandas

one-hot-encoding

scikit-learn

data-science

user2486276

2 Answers

piRSquared

Ami Tavory

Related questions

Recent Activity

Donate For Us