I want to produce tidy data from an Excel file which looks like this, with three levels of "merged" headers:
Pandas reads the file just fine, with multilevel headers:
# df = pandas.read_excel('test.xlsx', header=[0,1,2])
For repeatability, you can copy-paste this:
df = pandas.DataFrame({('Unnamed: 0_level_0', 'Unnamed: 0_level_1', 'a'): {1: 'aX', 2: 'aY'}, ('Unnamed: 1_level_0', 'Unnamed: 1_level_1', 'b'): {1: 'bX', 2: 'bY'}, ('Unnamed: 2_level_0', 'Unnamed: 2_level_1', 'c'): {1: 'cX', 2: 'cY'}, ('level1_1', 'level2_1', 'level3_1'): {1: 1, 2: 10}, ('level1_1', 'level2_1', 'level3_2'): {1: 2, 2: 20}, ('level1_1', 'level2_2', 'level3_1'): {1: 3, 2: 30}, ('level1_1', 'level2_2', 'level3_2'): {1: 4, 2: 40}, ('level1_2', 'level2_1', 'level3_1'): {1: 5, 2: 50}, ('level1_2', 'level2_1', 'level3_2'): {1: 6, 2: 60}, ('level1_2', 'level2_2', 'level3_1'): {1: 7, 2: 70}, ('level1_2', 'level2_2', 'level3_2'): {1: 8, 2: 80}})
I want to normalise this so that the level headings are in variable rows, but retain columns a, b and c as columns:
Without the multi-level headers, I would do pandas.melt(df, id_vars=['a', 'b', 'c'])
to get what I want. pandas.melt(df)
gives me the three variable columns I want, but obviously doesn't retain the a, b, and c columns.
It should be as simple as:
wide_df = pandas.read_excel(xlfile, sheetname, header=[0, 1, 2], index_col=[0, 1, 2, 3])
long_df = wide_df.stack().stack().stack()
Here's an example with a mock-up CSV file (note the 4th row to label the index and the first column to label the header levels):
from io import StringIO
from textwrap import dedent
import pandas
mockcsv = StringIO(dedent("""\
num,,,this1,this1,this1,this1,that1,that1,that1,that1
let,,,thisA,thisA,thatA,thatA,thisB,thisB,thatB,thatB
animal,,,cat,dog,bird,lizard,cat,dog,bird,lizard
a,b,c,,,,,,,,
a1,b1,c1,x1,x2,x3,x4,x5,x6,x7,x8
a1,b1,c2,y1,y2,y3,y4,y5,y6,y7,y8
a1,b2,c1,z1,z2,z3,z4,z5,6z,zy,z8
"""))
wide_df = pandas.read_csv(mockcsv, index_col=[0, 1, 2], header=[0, 1, 2])
long_df = wide_df.stack().stack().stack()
So wide_df
looks like this:
num this1 that1
let thisA thatA thisB thatB
animal cat dog bird lizard cat dog bird lizard
a b c
a1 b1 c1 x1 x2 x3 x4 x5 x6 x7 x8
c2 y1 y2 y3 y4 y5 y6 y7 y8
b2 c1 z1 z2 z3 z4 z5 6z zy z8
And long_df
a b c animal let num
a1 b1 c1 bird thatA this1 x3
thatB that1 x7
cat thisA this1 x1
thisB that1 x5
dog thisA this1 x2
thisB that1 x6
lizard thatA this1 x4
thatB that1 x8
c2 bird thatA this1 y3
thatB that1 y7
cat thisA this1 y1
thisB that1 y5
dog thisA this1 y2
thisB that1 y6
lizard thatA this1 y4
thatB that1 y8
b2 c1 bird thatA this1 z3
thatB that1 zy
cat thisA this1 z1
thisB that1 z5
dog thisA this1 z2
thisB that1 6z
lizard thatA this1 z4
thatB that1 z8
With literal data shown in the OP, you can get at this w/o modifying anything by doing the following:
index_names = ['a', 'b', 'c']
col_names = ['Level1', 'Level2', 'Level3']
df = (
pandas.read_excel('Book1.xlsx', header=[0, 1, 2], index_col=[0, 1, 2, 3])
.reset_index(level=0, drop=True)
.rename_axis(index_names, axis='index')
.rename_axis(col_names, axis='columns')
.stack()
.stack()
.stack()
.to_frame()
)
I think the tricky part will be inspecting each of your files to figure out what index_names
should be.
Fragment the DFs into two portions for the ease of melting and joining them back.
first_half = df.iloc[:, :3]
second_half = df.iloc[:, 3:]
Melt the second fragment.
melt_second_half = pd.melt(second_half)
Repeat the values in the first fragment by calculating the value found by dividing the number of rows in the melted DF
with it's own length.
repeats = int(melt_second_half.shape[0]/first_half.shape[0])
first_reps = pd.concat([first_half] * repeats, ignore_index=True)
col_names = first_reps.columns.get_level_values(2)
melt_first_half = pd.DataFrame(first_reps.values, columns=col_names)
Concatenate both back and sort the resulting DF
according to the value column.
df_concat = pd.concat([melt_first_half, melt_second_half], axis=1)
df_concat.sort_values('value').reset_index(drop=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With