I'm trying to reshape a dataframe, but I'm not able to get the results I need. The dataframe looks like this:
m r s p O W N
1 4 3 1 2.81 3.70 3.03
1 4 4 1 2.14 2.82 2.31
1 4 5 1 1.47 1.94 1.59
1 4 3 2 0.58 0.78 0.60
1 4 4 2 0.67 0.00 0.00
1 4 5 2 1.03 2.45 1.68
1 4 3 3 1.98 1.34 1.81
1 4 4 3 0.00 0.04 0.15
1 4 5 3 0.01 0.00 0.26
I need to reshape the dataframe so it will look like this:
m r s p O W N p O W N p O W N
1 4 3 1 2.81 3.70 3.03 2 0.58 0.78 0.60 3 1.98 1.34 1.81
1 4 4 1 2.14 2.82 2.31 2 0.67 0.00 0.00 3 0.00 0.04 0.15
1 4 5 1 1.47 1.94 1.59 2 1.03 2.45 1.68 3 0.01 0.00 0.26
I tried to use the pivot_table
function
df.pivot_table(index=['m','r','s'], columns=['p'], values=['O','W','N'])
but I'm not able to get quite what I want. Does anyone know how to do this?
As someone who fancies himself as pretty handy with pandas, the pivot_table
and melt
functions are confusing to me. I prefer to stick with a well-defined and unique index and use the stack
and unstack
methods of the dataframe itself.
First, I'll ask if you really need to repeat the p-column like that? I can sort of see its value when presenting data, but IMO pandas isn't really set up to work like that. We could shoehorn it in, but let's see if a simpler solution gets you what you need.
Here's what I would do:
from io import StringIO
import pandas
datatable = StringIO("""\
m r s p O W N
1 4 3 1 2.81 3.70 3.03
1 4 4 1 2.14 2.82 2.31
1 4 5 1 1.47 1.94 1.59
1 4 3 2 0.58 0.78 0.60
1 4 4 2 0.67 0.00 0.00
1 4 5 2 1.03 2.45 1.68
1 4 3 3 1.98 1.34 1.81
1 4 4 3 0.00 0.04 0.15
1 4 5 3 0.01 0.00 0.26""")
df = (
pandas.read_table(datatable, sep='\s+')
.set_index(['m', 'r', 's', 'p'])
.unstack(level='p')
)
df.columns = df.columns.swaplevel(0, 1)
df.sort(axis=1, inplace=True)
print(df)
Which prints:
p 1 2 3
O W N O W N O W N
m r s
1 4 3 2.81 3.70 3.03 0.58 0.78 0.60 1.98 1.34 1.81
4 2.14 2.82 2.31 0.67 0.00 0.00 0.00 0.04 0.15
5 1.47 1.94 1.59 1.03 2.45 1.68 0.01 0.00 0.26
So now the columns are a MultiIndex and you can access, for example, all of the values where p = 2
with df[2]
or df.xs(2, level='p', axis=1)
, which gives me:
O W N
m r s
1 4 3 0.58 0.78 0.60
4 0.67 0.00 0.00
5 1.03 2.45 1.68
Similarly, you can get all of the W
columns with: df.xs('W', level=1, axis=1)
(we say level=1
) because that column level does not have a name, so we use its position instead)
p 1 2 3
m r s
1 4 3 3.70 0.78 1.34
4 2.82 0.00 0.04
5 1.94 2.45 0.00
You can similarly query the columns by using axis=0
.
If you really need the p
values in a column, just add it there manually and reindex your columns:
for p in df.columns.get_level_values('p').unique():
df[p, 'p'] = p
cols = pandas.MultiIndex.from_product([[1,2,3], list('pOWN')])
df = df.reindex(columns=cols)
print(df)
1 2 3
p O W N p O W N p O W N
m r s
1 4 3 1 2.81 3.70 3.03 2 0.58 0.78 0.60 3 1.98 1.34 1.81
4 1 2.14 2.82 2.31 2 0.67 0.00 0.00 3 0.00 0.04 0.15
5 1 1.47 1.94 1.59 2 1.03 2.45 1.68 3 0.01 0.00 0.26
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With