Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make first row turn into second level MultiIndex

I have an existing DataFrame that looks like this:

     1   |   1   |   1   |   2   |   2   |   2   |   2
 --------------------------------------------------------
  | abc  |  def  |  ghi  |  jkl  |  mno  |  pqr  |  stu
  | 1.00 |  2.00 |  3.00 |  4.00 |  5.00 |  6.00 |  7.00
  | 1.00 |  2.00 |  3.00 |  4.00 |  5.00 |  6.00 |  7.00
  | 1.00 |  2.00 |  3.00 |  4.00 |  5.00 |  6.00 |  7.00
  | 1.00 |  2.00 |  3.00 |  4.00 |  5.00 |  6.00 |  7.00
  | 1.00 |  2.00 |  3.00 |  4.00 |  5.00 |  6.00 |  7.00

I've been trying this for sometime, but no success.

The repeated ones and twos are already a one level MultiIndex. I know that if I add another level they will merge together, but having a hard time transforming that first row into the second level of the MultiIndex.

Is there a simple way of doing this?

desired output:

             1           |               2             
  | abc  |  def  |  ghi  |  jkl  |  mno  |  pqr  |  stu
 --------------------------------------------------------
  | 1.00 |  2.00 |  3.00 |  4.00 |  5.00 |  6.00 |  7.00
  | 1.00 |  2.00 |  3.00 |  4.00 |  5.00 |  6.00 |  7.00
  | 1.00 |  2.00 |  3.00 |  4.00 |  5.00 |  6.00 |  7.00
  | 1.00 |  2.00 |  3.00 |  4.00 |  5.00 |  6.00 |  7.00
  | 1.00 |  2.00 |  3.00 |  4.00 |  5.00 |  6.00 |  7.00

any help would be very appreciated! Thanks

like image 716
Omar Omeiri Avatar asked Mar 05 '23 03:03

Omar Omeiri


2 Answers

The solution proposed by Jezrael requires some corrections:

  1. df.columns and df.iloc[0] should be together the first argument of from_arrays, not two separate arguments.

  2. The source of the second level of MultiIndex (df.iloc[0]) should be supplemented with .values. Otherwise this MultiIndex level inherits name (0) - the index value of row 0.

  3. The resulting MultiIndex should be substituted to df.columns, not to the whole df.

So the whole solution should be:

df.columns = pd.MultiIndex.from_arrays([df.columns, df.iloc[0].values])
df = df.iloc[1:]
like image 102
Valdi_Bo Avatar answered Mar 16 '23 02:03

Valdi_Bo


I think you need MultiIndex.from_arrays and then filter out first row by DataFrame.iloc with indexing:

df = pd.MultiIndex.from_arrays(df.columns, df.iloc[0])
df = df.iloc[1:]
like image 22
jezrael Avatar answered Mar 16 '23 02:03

jezrael