I have a dataframe as below,
idx vol loc
0 1 2 unit 68
1 1 1 unit 179
2 1 2 unit 345
3 2 1 unit 233
4 2 1 unit 235
5 2 2 unit 313
6 2 1 unit 313
I aim to do one-hot encoding for the 'loc' column with the 'vol' values filling in the one-hot encoded columns. Keen to use 'get_dummies' and 'groupby' methods in pandas.
My expected output is as below. My challenge is I am unable to fill the values from the 'vol' column in the newly created one-hot encoded columns.
Tried to spend 2 days but still have not found a workaround for this. Hope you can give me a hint.
Expected
loc unit 179 unit 233 unit 235 unit 312 unit 313 unit 345 unit 68
id
1 1 0 0 0 0 2 2
2 0 1 1 0 3 0 0
MWE
import pandas as pd
data = {'idx': [1, 1, 1, 2, 2, 2, 2],
'vol': [2, 1, 2, 1, 1, 2, 1],
'loc': ['unit 68', 'unit 179', 'unit 345', 'unit 233', 'unit 235', 'unit 313', 'unit 313']}
df = pd.DataFrame(data)
# Syntax
# pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None) → ’DataFrame’
df = pd.get_dummies(df, columns=['loc'], prefix='', prefix_sep='')
print(df)
df = df.groupby('idx').agg(sum)
print(df)
Output
idx vol unit 179 unit 233 unit 235 unit 313 unit 345 unit 68
0 1 2 0 0 0 0 0 1
1 1 1 1 0 0 0 0 0
2 1 2 0 0 0 0 1 0
3 2 1 0 1 0 0 0 0
4 2 1 0 0 1 0 0 0
5 2 2 0 0 0 1 0 0
6 2 1 0 0 0 1 0 0
vol unit 179 unit 233 unit 235 unit 313 unit 345 unit 68
idx
1 5 1 0 0 0 1 1
2 5 0 1 1 2 0 0
You can do
df=df.set_index('idx')
s=df['loc'].str.get_dummies().mul(df['vol'],axis=0).sum(level=0)
unit179 unit233 unit235 unit313 unit345 unit68
idx
1 1 0 0 0 2 2
2 0 1 1 3 0 0
IIUC,
df.pivot_table('vol', 'idx', 'loc', aggfunc='sum', fill_value=0)
Output:
loc unit 179 unit 233 unit 235 unit 313 unit 345 unit 68
idx
1 1 0 0 0 2 2
2 0 1 1 3 0 0
Add totals also,
df.pivot_table('vol', 'idx', 'loc', aggfunc='sum',
fill_value=0, margins=True, margins_name='vol')
Output:
loc unit 179 unit 233 unit 235 unit 313 unit 345 unit 68 vol
idx
1 1 0 0 0 2 2 5
2 0 1 1 3 0 0 5
vol 1 1 1 3 2 2 10
Drop row grand total:
df.pivot_table('vol', 'idx', 'loc', aggfunc='sum',
fill_value=0, margins=True, margins_name='vol')\
.drop('vol')
Output:
loc unit 179 unit 233 unit 235 unit 313 unit 345 unit 68 vol
idx
1 1 0 0 0 2 2 5
2 0 1 1 3 0 0 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With