I have a data frame that looks like this:
df = pd.DataFrame({"value": [4, 5, 3], "item1": [0, 1, 0], "item2": [1, 0, 0], "item3": [0, 0, 1]})
df
value item1 item2 item3
0 4 0 1 0
1 5 1 0 0
2 3 0 0 1
Basically what I want to do is replace the value of the one hot encoded elements with the value from the "value" column and then delete the "value" column. The resulting data frame should be like this:
df_out = pd.DataFrame({"item1": [0, 5, 0], "item2": [4, 0, 0], "item3": [0, 0, 3]})
item1 item2 item3
0 0 4 0
1 5 0 0
2 0 0 3
DataFrame. replace() function is used to replace values in column (one value with another value on all columns). This method takes to_replace, value, inplace, limit, regex and method as parameters and returns a new DataFrame. When inplace=True is used, it replaces on existing DataFrame object and returns None value.
In order to replace a value in Pandas DataFrame, use the replace() method with the column the from and to values.
Pandas replace multiple values in column replace. By using DataFrame. replace() method we will replace multiple values with multiple new strings or text for an individual DataFrame column. This method searches the entire Pandas DataFrame and replaces every specified value.
Why not just multiply?
df.pop('value').values * df
item1 item2 item3
0 0 5 0
1 4 0 0
2 0 0 3
DataFrame.pop
has the nice effect of in-place removing and returning a column, so you can do this in a single step.
if the "item_*" columns have anything besides 1 in them, then you can multiply with bools:
df.pop('value').values * df.astype(bool)
item1 item2 item3
0 0 5 0
1 4 0 0
2 0 0 3
If your DataFrame has other columns, then do this:
df
value name item1 item2 item3
0 4 John 0 1 0
1 5 Mike 1 0 0
2 3 Stan 0 0 1
# cols = df.columns[df.columns.str.startswith('item')]
cols = df.filter(like='item').columns
df[cols] = df.pop('value').values * df[cols]
df
name item1 item2 item3
0 John 0 5 0
1 Mike 4 0 0
2 Stan 0 0 3
You could do something like:
df = pd.DataFrame([df['value']*df['item1'],df['value']*df['item2'],df['value']*df['item3']])
df.columns = ['item1','item2','item3']
EDIT: As this answer will not scale well to many columns as @coldspeed comments, it should be done iterating a loop:
cols = ['item1','item2','item3']
for c in cols:
df[c] *= df['value']
df.drop('value',axis=1,inplace=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With