Suppose I have a data frame data
with strings that I want converted to indicators. I use pandas.get_dummies(data)
to convert this to a dataset that I can now use for building a model.
Now I have a single new observation that I want to run through my model. Obviously I can't use pandas.get_dummies(new_data)
because it doesn't contain all of the classes and won't make the same indicator matrices. Is there a good way to do this?
get_dummies() is used for data manipulation. It converts categorical data into dummy or indicator variables.
In order to convert data types in pandas, there are three basic options: Use astype() to force an appropriate dtype. Create a custom function to convert the data. Use pandas functions such as to_numeric() or to_datetime()
To convert your categorical variables to dummy variables in Python you c an use Pandas get_dummies() method. For example, if you have the categorical variable “Gender” in your dataframe called “df” you can use the following code to make dummy variables: df_dc = pd. get_dummies(df, columns=['Gender']) .
get_dummies there is a parameter i.e. drop_first allows you whether to keep or remove the reference (whether to keep k or k-1 dummies out of k categorical levels).
you can create the dummies from the single new observation, and then reindex this frames columns using the columns from the original indicator matrix:
import pandas as pd df = pd.DataFrame({'cat':['a','b','c','d'],'val':[1,2,5,10]}) df1 = pd.get_dummies(pd.DataFrame({'cat':['a'],'val':[1]})) dummies_frame = pd.get_dummies(df) df1.reindex(columns = dummies_frame.columns, fill_value=0)
returns:
val cat_a cat_b cat_c cat_d 0 1 1 0 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With