I have a table where each row can belong to multiple categories such as,
test = pd.DataFrame({
            'name': ['a', 'b'],
            'category': [['cat1', 'cat2'],['cat1', 'cat3']]
    })
How can I convert each category to a dummy variable in such a way that the above table becomes,
test_res = pd.DataFrame({
        'name': ['a', 'b'],
        'cat1': [1, 1],
        'cat2': [1, 0],
        'cat3': [0, 1]
    })
I tried pd.get_dummies(test['category']) but get the following error,
TypeError: unhashable type: 'list'
                To convert your categorical variables to dummy variables in Python you c an use Pandas get_dummies() method. For example, if you have the categorical variable “Gender” in your dataframe called “df” you can use the following code to make dummy variables: df_dc = pd. get_dummies(df, columns=['Gender']) .
get_dummies() is used for data manipulation. It converts categorical data into dummy or indicator variables.
drop_first. The drop_first parameter specifies whether or not you want to drop the first category of the categorical variable you're encoding. By default, this is set to drop_first = False . This will cause get_dummies to create one dummy variable for every level of the input categorical variable.
You can use pandas.get_dummies, but first convert list column to new DataFrame:
print (pd.DataFrame(test.category.values.tolist()))
      0     1
0  cat1  cat2
1  cat1  cat3
print (pd.get_dummies(pd.DataFrame(test.category.values.tolist()), prefix_sep='', prefix=''))
   cat1  cat2  cat3
0     1     1     0
1     1     0     1
Last add column name by concat:
print (pd.concat([pd.get_dummies(pd.DataFrame(test.category.values.tolist()),
                                 prefix_sep='', prefix='' ), 
        test[['name']]], axis=1))
   cat1  cat2  cat3 name
0     1     1     0    a
1     1     0     1    b
Another solution with Series.str.get_dummies:
print (test.category.astype(str).str.strip('[]'))
0    'cat1', 'cat2'
1    'cat1', 'cat3'
Name: category, dtype: object
df = test.category.astype(str).str.strip('[]').str.get_dummies(', ')
df.columns = df.columns.str.strip("'")
print (df)
   cat1  cat2  cat3
0     1     1     0
1     1     0     1
print (pd.concat([df, test[['name']]], axis=1))
   cat1  cat2  cat3 name
0     1     1     0    a
1     1     0     1    b
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With