Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to change header row in a python dataframe

Tags:

python

pandas

I'm having trouble changing the header row in an existing DataFrame using pandas in python. After importing pandas and the csv file I set a header row as None in order to be able to remove duplicate dates after transposing. However this leaves me with a row header (and in fact an index column) that I do not want.

df = pd.read_csv(spreadfile, header=None)

df2 = df.T.drop_duplicates([0], take_last=True)
del df2[1]

indcol = df2.ix[:,0]
df3 = df2.reindex(indcol)

The above unimaginative code however fails on two counts. The index column is now the required one however all entries are now NaN. My understanding of python is not yet good enough to recognise what python is doing. The desired output below is what I need, any help would be greatly appreciated!

df2 before reindexing:

     0             2             3             4             5
0        NaN  XS0089553282  XS0089773484  XS0092157600  XS0092541969
1  01-May-14         131.7         165.1         151.8          88.9
3  02-May-14           131         164.9         151.7          88.5
5  05-May-14         131.1           165         151.8          88.6
7  06-May-14         129.9         163.4         151.2          87.1

df2 after reindexing:

             0    2    3    4    5
0                                 
NaN        NaN  NaN  NaN  NaN  NaN
01-May-14  NaN  NaN  NaN  NaN  NaN
02-May-14  NaN  NaN  NaN  NaN  NaN
05-May-14  NaN  NaN  NaN  NaN  NaN
06-May-14  NaN  NaN  NaN  NaN  NaN

df2 desired:

       XS0089553282  XS0089773484  XS0092157600  XS0092541969
01-May-14         131.7         165.1         151.8          88.9
02-May-14           131         164.9         151.7          88.5
05-May-14         131.1           165         151.8          88.6
06-May-14         129.9         163.4         151.2          87.1
like image 680
Oaka13 Avatar asked Nov 05 '15 09:11

Oaka13


2 Answers

Assign the columns directly:

indcol = df2.ix[:,0]
df2.columns = indcol

The problem with reindex is it'll use the existing index and column values of your df, so your passed in new column values don't exist, hence why you get all NaNs

A simpler approach to what you're trying to do:

In [147]:
# take the cols and index values of interest
cols = df.loc[0, '2':]
idx = df['0'].iloc[1:]
print(cols)
print(idx)

2    XS0089553282
3    XS0089773484
4    XS0092157600
5    XS0092541969
Name: 0, dtype: object

1    01-May-14
3    02-May-14
5    05-May-14
7    06-May-14
Name: 0, dtype: object

In [157]:
# drop the first row and the first column
df2 = df.drop('0', axis=1).drop(0)
# overwrite the index values
df2.index = idx.values
df2

Out[157]:
               2      3      4     5
01-May-14  131.7  165.1  151.8  88.9
02-May-14    131  164.9  151.7  88.5
05-May-14  131.1    165  151.8  88.6
06-May-14  129.9  163.4  151.2  87.1

In [158]:
# now overwrite the column values    
df2.columns = cols.values
df2

Out[158]:
          XS0089553282 XS0089773484 XS0092157600 XS0092541969
01-May-14        131.7        165.1        151.8         88.9
02-May-14          131        164.9        151.7         88.5
05-May-14        131.1          165        151.8         88.6
06-May-14        129.9        163.4        151.2         87.1
like image 126
EdChum Avatar answered Oct 03 '22 11:10

EdChum


In [310]:
cols = df.iloc[0 , 1:]
cols
Out[310]:
1    XS0089553282
2    XS0089773484
3    XS0092157600
4    XS0092541969
Name: 0, dtype: object

In [311]:
df.drop(0 , inplace=True)
df
Out[311]:
           0    1       2          3    4
1   01-May-14   131.7   165.1   151.8   88.9
2   02-May-14   131     164.9   151.7   88.5
3   05-May-14   131.1   165     151.8   88.6
4   06-May-14   129.9   163.4   151.2   87.1

In [312]:
df.set_index(0 , inplace=True)
df

Out[312]:
    0           1   2           3   4       
01-May-14   131.7   165.1   151.8   88.9
02-May-14   131     164.9   151.7   88.5
05-May-14   131.1   165     151.8   88.6
06-May-14   129.9   163.4   151.2   87.1

In [315]:

df
df.columns = cols
df
Out[315]:
            XS0089553282    XS0089773484    XS0092157600    XS0092541969                
01-May-14   131.7                  165.1    151.8           88.9
02-May-14   131                    164.9    151.7           88.5
05-May-14   131.1                    165    151.8           88.6
06-May-14   129.9                  163.4    151.2           87.1
like image 23
Nader Hisham Avatar answered Oct 03 '22 09:10

Nader Hisham