Suppose I have data that looks like:
data = {'Name':['Tom', 'Bob', 'Dan', 'Jack'],
'Color1':['red', 'red', 'black', 'blue'],
'Color2':['blue', 'green', 'green', 'white'],
'Color3':['orange', 'purple', 'white', 'red'],
'Color4':['', 'yellow', 'purple', '']
}
df = pd.DataFrame(data)
I want to set dummy variables for each person, so that if a specific color is listed for a person in any of color1, color2, color3, color4, then that person receives a 1, or else that person receives a 0. However, I'm not interested in setting a dummy variable for every color that appears: I'm only interested in setting variables for colors red, black, and yellow.
Thus the expected output would be:
result = {'Name':['Tom', 'Bob', 'Dan', 'Jack'],
'hasRed':[1, 1, 0, 1],
'hasBlack':[0, 0, 1, 0],
'hasYellow':[0, 0, 1, 0]
}
result_df = pd.DataFrame(result)
I know pandas has a get_dummy function, but I don't think it can be used on multiple columns for specific variables like I need in my case. Any suggestions on how to do this?
Let us try melting the dataframe, filter the colors and crosstab:
colors = ['red','blue','yellow']
tmp = (df.melt('Name')
.loc[lambda x: x['value'].isin(colors)]
)
pd.crosstab(tmp['Name'],tmp['value']).add_prefix('has_').reset_index()
Output:
value Name has_blue has_red has_yellow
0 Bob 0 1 1
1 Jack 1 1 0
2 Tom 1 1 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With