I am looking for for a pythonic way to handle the following problem.
The pandas.get_dummies()
method is great to create dummies from a categorical column of a dataframe. For example, if the column has values in ['A', 'B']
, get_dummies()
creates 2 dummy variables and assigns 0 or 1 accordingly.
Now, I need to handle this situation. A single column, let's call it 'label', has values like ['A', 'B', 'C', 'D', 'A*C', 'C*D']
. get_dummies()
creates 6 dummies, but I only want 4 of them, so that a row could have multiple 1s.
Is there a way to handle this in a pythonic way? I could only think of some step-by-step algorithm to get it, but that would not include get_dummies(). Thanks
Edited, hope it is more clear!
For example, if you have the categorical variable “Gender” in your dataframe called “df” you can use the following code to make dummy variables: df_dc = pd. get_dummies(df, columns=['Gender']) . If you have multiple categorical variables you simply add every variable name as a string to the list!
Split column by delimiter into multiple columnsApply the pandas series str. split() function on the “Address” column and pass the delimiter (comma in this case) on which you want to split the column. Also, make sure to pass True to the expand parameter.
Pandas replace multiple values in column replace. By using DataFrame. replace() method we will replace multiple values with multiple new strings or text for an individual DataFrame column. This method searches the entire Pandas DataFrame and replaces every specified value.
I know it's been a while since this question was asked, but there is (at least now there is) a one-liner that is supported by the documentation:
In [4]: df Out[4]: label 0 (a, c, e) 1 (a, d) 2 (b,) 3 (d, e) In [5]: df['label'].str.join(sep='*').str.get_dummies(sep='*') Out[5]: a b c d e 0 1 0 1 0 1 1 1 0 0 1 0 2 0 1 0 0 0 3 0 0 0 1 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With