Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LabelBinarizer for multiple columns in data frame

I have a csv file which has 25 columns some are numeric and some are categorical and some are like names of actors, directors. I want use regression models on this data. In order to do so I have to convert the categorical columns string types to numeric values using LabelBinarizer from scikit package. How can I use LabelBinarize on this dataframe which has multiple categorical data?

SampleData

Essentially I want to binarize the labels and add them to the dataframe.

In the below code, I have retrieved the list of the columns I want to binarize not able to figure out how to add the new column back to the df?

categorylist = ['color', 'language', 'country', 'content_rating']
for col in categorylist:
    tempdf = label_binarizer.fit_transform(df[col])

In the next step, I want add the tempdf to df and drop the original column df[col].

like image 930
aks_Nin Avatar asked Nov 07 '16 02:11

aks_Nin


People also ask

How do I insert multiple columns in a data frame?

Using DataFrame. insert() method, we can add new columns at specific position of the column name sequence. Although insert takes single column name, value as input, but we can use it repeatedly to add multiple columns to the DataFrame.

How do I encode multiple columns at once?

Use LabelEncoder to Encode Single Columns. Use LabelEncoder to Encode Multiple Columns All at Once.


1 Answers

You can do this in a one-liner with pd.get_dummies:

tempdf = pd.get_dummies(df, columns=categorylist)

Otherwise you can use a FeatureUnion with FunctionTransformer as in the answer to sklearn pipeline - how to apply different transformations on different columns

EDIT: As added by @dukebody in the comments, you can also use the sklearn-pandas package which purpose is to be able to apply different transformations to each dataframe column.

like image 168
maxymoo Avatar answered Sep 23 '22 08:09

maxymoo