Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Easy way to apply transformation from `pandas.get_dummies` to new data?

Tags:

python

pandas

Suppose I have a data frame data with strings that I want converted to indicators. I use pandas.get_dummies(data) to convert this to a dataset that I can now use for building a model.

Now I have a single new observation that I want to run through my model. Obviously I can't use pandas.get_dummies(new_data) because it doesn't contain all of the classes and won't make the same indicator matrices. Is there a good way to do this?

like image 783
Ellis Valentiner Avatar asked Feb 11 '15 22:02

Ellis Valentiner


People also ask

What's the use of pandas Get_dummies () method?

get_dummies() is used for data manipulation. It converts categorical data into dummy or indicator variables.

How do I convert data from one pandas to another?

In order to convert data types in pandas, there are three basic options: Use astype() to force an appropriate dtype. Create a custom function to convert the data. Use pandas functions such as to_numeric() or to_datetime()

How do you convert pandas to columns for dummies?

To convert your categorical variables to dummy variables in Python you c an use Pandas get_dummies() method. For example, if you have the categorical variable “Gender” in your dataframe called “df” you can use the following code to make dummy variables: df_dc = pd. get_dummies(df, columns=['Gender']) .

What does Drop_first do in Get_dummies?

get_dummies there is a parameter i.e. drop_first allows you whether to keep or remove the reference (whether to keep k or k-1 dummies out of k categorical levels).


1 Answers

you can create the dummies from the single new observation, and then reindex this frames columns using the columns from the original indicator matrix:

import pandas as pd df = pd.DataFrame({'cat':['a','b','c','d'],'val':[1,2,5,10]}) df1 = pd.get_dummies(pd.DataFrame({'cat':['a'],'val':[1]})) dummies_frame = pd.get_dummies(df) df1.reindex(columns = dummies_frame.columns, fill_value=0) 

returns:

        val     cat_a   cat_b   cat_c   cat_d   0     1       1       0       0       0 
like image 66
JAB Avatar answered Sep 21 '22 22:09

JAB