Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiindex only some of columns in Pandas

I have a csv which is generated in a format that I can not change. The file has a multi index. The file looks like this.

enter image description here

The end goal is to turn the top row (hours) into an index, and index it with the "ID" column, so that the data looks like this.

enter image description here

I have imported the file into pandas...

myfile = 'c:/temp/myfile.csv'
df = pd.read_csv(myfile, header=[0, 1], tupleize_cols=True)
pd.set_option('display.multi_sparse', False)
df.columns = pd.MultiIndex.from_tuples(df.columns, names=['hour', 'field'])
df

But that gives me three unnamed fields:

enter image description here

My final step is to stack on hour:

df.stack(level=['hour'])

But I a missing what comes before that, where I can index the other columns, even though there's a blank multiindex line above them.

like image 550
Sir Larry Wildman Avatar asked Mar 11 '16 23:03

Sir Larry Wildman


People also ask

How do I drop one level of MultiIndex pandas?

Drop Level Using MultiIndex.droplevel() to drop columns level. When you have Multi-level columns DataFrame. columns return MultiIndex object and use droplevel() on this object to drop level.

How do you make a MultiIndex column in pandas?

pandas MultiIndex to ColumnsUse pandas DataFrame. reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False which will keep the index values as columns and set the new index to DataFrame starting from zero.

How do I flatten a multi-level column in pandas?

Flatten columns: use get_level_values() Flatten columns: use to_flat_index() Flatten columns: join column labels. Flatten rows: flatten all levels.


1 Answers

I believe the lines you are missing may be # 3 and 4:

df = pd.io.parsers.read_csv('temp.csv', header = [0,1], tupleize_cols = True)
df.columns = [c for _, c in df.columns[:3]] + [c for c in df.columns[3:]]
df = df.set_index(list(df.columns[:3]), append = True)
df.columns = pd.MultiIndex.from_tuples(df.columns, names = ['hour', 'field'])
  1. Convert the tuples to strings by dropping the first value for first 3 col. headers.
  2. Shelter these headers by placing them in an index.

After you perform the stack, you may reset the index if you like.

e.g.

Before

  (Unnamed: 0_level_0, Date)  (Unnamed: 1_level_0, id)  \
0                  3/11/2016                         5   
1                  3/11/2016                         6   

  (Unnamed: 2_level_0, zone)  (100, p1)  (100, p2)  (200, p1)  (200, p2)  
0                        abc      0.678      0.787      0.337      0.979  
1                        abc      0.953      0.559      0.776      0.520  

After

field                        p1     p2
  Date      id zone hour              
0 3/11/2016 5  abc  100   0.678  0.787
                    200   0.337  0.979
1 3/11/2016 6  abc  100   0.953  0.559
                    200   0.776  0.520
like image 160
hilberts_drinking_problem Avatar answered Sep 28 '22 10:09

hilberts_drinking_problem