Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

create dummies from a column for a subset of data, which does't contains all the category value in that column

I am handling a subset of the a large data set.

There is a column named "type" in the dataframe. The "type" are expected to have values like [1,2,3,4].

In a certain subset, I find the "type" column only contains certain values like [1,4],like

 In [1]: df
 Out[2]:
          type
    0      1
    1      4

When I create dummies from column "type" on that subset, it turns out like this:

In [3]:import pandas as pd
In [4]:pd.get_dummies(df["type"], prefix = "type")
Out[5]:        type_1 type_4
        0        1       0
        1        0       1

It does't have the columns named "type_2", "type_3".What i want is like:

 Out[6]:        type_1 type_2 type_3 type_4
            0      1      0       0      0
            1      0      0       0      1

Is there a solution for this?

like image 954
jessie tio Avatar asked Mar 09 '23 06:03

jessie tio


2 Answers

What you need to do is make the column 'type' into a pd.Categorical and specify the categories

pd.get_dummies(pd.Categorical(df.type, [1, 2, 3, 4]), prefix='type')

   type_1  type_2  type_3  type_4
0       1       0       0       0
1       0       0       0       1
like image 80
piRSquared Avatar answered May 16 '23 09:05

piRSquared


Another solution with reindex_axis and add_prefix:

df1 = pd.get_dummies(df["type"])
        .reindex_axis([1,2,3,4], axis=1, fill_value=0)
        .add_prefix('type')
print (df1)
   type1  type2  type3  type4
0      1      0      0      0
1      0      0      0      1

Or categorical solution:

df1 = pd.get_dummies(df["type"].astype('category', categories=[1, 2, 3, 4]), prefix='type')
print (df1)
   type_1  type_2  type_3  type_4
0       1       0       0       0
1       0       0       0       1
like image 25
jezrael Avatar answered May 16 '23 08:05

jezrael