Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

set_index equivalent for columns headings

Tags:

python

pandas

In Pandas, if I have a DataFrame that looks like:

            0       1       2       3       4       5       6
0                2013    2012    2011    2010    2009    2008
1     January   3,925   3,463   3,289   3,184   3,488   4,568
2    February   3,632   2,983   2,902   3,053   3,347   4,527
3       March   3,909   3,166   3,217   3,175   3,636   4,594
4       April   3,903   3,258   3,146   3,023   3,709   4,574
5         May   4,075   3,234   3,266   3,033   3,603   4,511
6        June   4,038   3,272   3,316   2,909   3,057   4,081
7        July           3,661   3,359   3,062   3,354   4,215
8      August           3,942   3,417   3,077   3,395   4,139
9   September           3,703   3,169   3,095   3,100   3,752
10    October           3,727   3,469   3,179   3,375   3,874
11   November           3,722   3,145   3,159   3,213   3,567
12   December           3,866   3,251   3,199   3,324   3,362
13      Total  23,482  41,997  38,946  37,148  40,601  49,764

I can convert the first column to be the index using:

In [55]: df.set_index([0])
Out[55]: 
                1       2       3       4       5       6
0                                                        
             2013    2012    2011    2010    2009    2008
January     3,925   3,463   3,289   3,184   3,488   4,568
February    3,632   2,983   2,902   3,053   3,347   4,527
March       3,909   3,166   3,217   3,175   3,636   4,594
April       3,903   3,258   3,146   3,023   3,709   4,574
May         4,075   3,234   3,266   3,033   3,603   4,511
June        4,038   3,272   3,316   2,909   3,057   4,081
July                3,661   3,359   3,062   3,354   4,215
August              3,942   3,417   3,077   3,395   4,139
September           3,703   3,169   3,095   3,100   3,752
October             3,727   3,469   3,179   3,375   3,874
November            3,722   3,145   3,159   3,213   3,567
December            3,866   3,251   3,199   3,324   3,362
Total      23,482  41,997  38,946  37,148  40,601  49,764

My question is how to convert the first row to be the column headings? The closest I can get is:

In [53]: df.set_index([0]).rename(columns=df.loc[0])
Out[53]: 
             2013    2012    2011    2010    2009    2008
0                                                        
             2013    2012    2011    2010    2009    2008
January     3,925   3,463   3,289   3,184   3,488   4,568
February    3,632   2,983   2,902   3,053   3,347   4,527
March       3,909   3,166   3,217   3,175   3,636   4,594
April       3,903   3,258   3,146   3,023   3,709   4,574
May         4,075   3,234   3,266   3,033   3,603   4,511
June        4,038   3,272   3,316   2,909   3,057   4,081
July                3,661   3,359   3,062   3,354   4,215
August              3,942   3,417   3,077   3,395   4,139
September           3,703   3,169   3,095   3,100   3,752
October             3,727   3,469   3,179   3,375   3,874
November            3,722   3,145   3,159   3,213   3,567
December            3,866   3,251   3,199   3,324   3,362
Total      23,482  41,997  38,946  37,148  40,601  49,764

but then I have to go in and remove the first row.

like image 453
Alex Rothberg Avatar asked Oct 01 '13 02:10

Alex Rothberg


2 Answers

The best way to handle this is to avoid getting into this situation.

How was df created? For example, if you used read_csv or a variant, then header=0 will tell read_csv to parse the first line as the column names.


Given df as you have it, I don't think there is an easier way to fix it than what you've described. To remove the first row, you could use df.iloc:

df = df.iloc[1:]
like image 89
unutbu Avatar answered Sep 19 '22 14:09

unutbu


I'm not sure if this is more efficient, but you could try creating a data frame with the corect index and default column names out of your problem data frame, and then rename the columns also using the promlematic data frame. For example:

import pandas as pd
import numpy as np
from pandas import DataFrame

data = {'0':[' ', 'Jan', 'Feb', 'Mar', 'April'], \
        '1' : ['2013', 3926, 3456, 3245, 1254],  \
        '2' : ['2012', 3346, 4342, 1214, 4522],  \
        '3' : ['2011', 3946, 4323, 1214, 8922]}

DF = DataFrame(data)
DF2 = (DataFrame(DF.ix[1:, 1:]).set_index(DF.ix[1:,0]))
DF2.columns = DF.ix[0, 1:]
DF2
like image 29
Woody Pride Avatar answered Sep 17 '22 14:09

Woody Pride