I have a pandas data frame like this:
ts | thing_0 | qty_0 | thing_1 | qty_1 | thing_2 | qty_2
--------------------------------------------------------
1 | dog | 5 | cat | 3 | mouse | 1
2 | house | 6 | dog | 4 | cat | 2
...
I want to transform this in a way that the thing
s become columns and the qty
s become the cell values. Like this:
ts | dog | cat | mouse | house
------------------------------
1 | 5 | 3 | 1 | 0
2 | 4 | 2 | 0 | 6
...
Currently, I'm doing this transformation manually by iterating over the df.values
array manually, but this is very slow. Is there a faster way to implement this with pandas means?
I have seen df.pivot, but couldn't find a way to describe the relationship between thing_0
and qty_0
.
You could use lreshape
to coalesce the thing
and qty
columns:
In [10]: pd.lreshape(df, {'thing':['thing_0','thing_1','thing_2',], 'qty':['qty_0','qty_1','qty_2']})
Out[10]:
ts thing qty
0 1 dog 5
1 2 house 6
2 1 cat 3
3 2 dog 4
4 1 mouse 1
5 2 cat 2
Then pivot
to create the desired DataFrame:
import pandas as pd
df = pd.DataFrame({'qty_0': [5, 6], 'qty_1': [3, 4], 'qty_2': [1, 2], 'thing_0': ['dog', 'house'], 'thing_1': ['cat', 'dog'], 'thing_2': ['mouse', 'cat'], 'ts': [1, 2]})
reshaped = pd.lreshape(df, {'thing':['thing_0','thing_1','thing_2',],
'qty':['qty_0','qty_1','qty_2']})
result = reshaped.pivot(index='ts', columns='thing', values='qty')
print(result)
yields
thing cat dog house mouse
ts
1 3.0 5.0 0.0 1.0
2 2.0 4.0 6.0 0.0
I think jezrael's solution is better since it takes advantage of the regularity of the column names you wish to coalesce. df.columns.str.split('_', expand=True)
is more general and less repetitive than
{'thing':['thing_0','thing_1','thing_2',],
'qty':['qty_0','qty_1','qty_2']}
lreshape
might be helpful in situations where the column names you wish to coalesce
are irregular.
You can create MultiIndex
from columns by str.split
and then reshape by stack
and unstack
:
df = df.set_index('ts')
df.columns = df.columns.str.split('_', expand=True)
df = df.stack().reset_index(level=1, drop=True)
.set_index('thing', append=True)['qty'].unstack(fill_value=0)
print (df)
thing cat dog house mouse
ts
1 3 5 0 1
2 2 4 6 0
Another solution with pivot
instead unstack
:
df = df.set_index('ts')
df.columns = df.columns.str.split('_', expand=True)
df = df.stack().reset_index()
df = df.pivot(index='ts', columns='thing', values='qty').fillna(0).astype(int)
print (df)
thing cat dog house mouse
ts
1 3 5 0 1
2 2 4 6 0
And third solution - dynamically create dict for lreshape
and then use unstack
:
Notice - columns have to be sorted, if not add df = df.sort_index(axis=1)
t = [x for x in df.columns if x.startswith('thing')]
q = [x for x in df.columns if x.startswith('qty')]
df = pd.lreshape(df, {'thing':t, 'qty':q})
.set_index(['ts','thing'])['qty'].unstack(fill_value=0)
print (df)
thing cat dog house mouse
ts
1 3 5 0 1
2 2 4 6 0
EDIT:
lreshape
is now undocumented, but is possible in future will by removed (with pd.wide_to_long too).
Possible solution is merging all 3 functions to one - maybe melt
, but now it is not implementated. Maybe in some new version of pandas. Then my answer will be updated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With