Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Preserve column order while one-hot encoding using pandas.get_dummies

What is the best/most Pythonic way to one-hot encode categorical features in a Pandas data frame while preserving the original order of the columns from which the categories (new column names) are extracted?

For example, if I have three columns in my data frame (df0): ["Col_continuous", "Col_categorical", "Labels"], and I use

df1hot = pd.get_dummies(df0, columns = ["Col_categorical"])

the new data frame has the newly created columns appearing after the "Labels" column. I want the new columns in between "Col_continuous" and "Labels".

For robustness, I want the order preserved when dealing with data frames with categorical columns arbitrarily ordered among the rest of the columns For example, for ["Cont1", "Cat1", "Cont2", "Cont3", "Cat2", "Labels"], I want the new columns resulting from "Cat1" to be in between "Cont1" and "Cont2". Assume that I already have a variable, say categoricalCols, which is a list of names of categorical features.

Edit 1: changed df1hot = pd.get_dummies(df0, columns = ["Col_continuous"]) to df1hot = pd.get_dummies(df0, columns = ["Col_categorical"]) thanks to Juan C's comment.

Edit 2: added paragraph starting with "For robustness,..."

like image 202
strangeloop Avatar asked Oct 20 '25 22:10

strangeloop


1 Answers

IIUC I would go with something like this:

df.columns=['Col_continuous',*[i for i in df.columns if 'Col_categorical' in i], 'Labels']

This tells pandas to put every column created by get_dummies in the middle of df.columns

like image 128
Juan C Avatar answered Oct 22 '25 12:10

Juan C