I'm having trouble changing the header row in an existing DataFrame using pandas in python. After importing pandas and the csv file I set a header row as None in order to be able to remove duplicate dates after transposing. However this leaves me with a row header (and in fact an index column) that I do not want.
df = pd.read_csv(spreadfile, header=None)
df2 = df.T.drop_duplicates([0], take_last=True)
del df2[1]
indcol = df2.ix[:,0]
df3 = df2.reindex(indcol)
The above unimaginative code however fails on two counts. The index column is now the required one however all entries are now NaN. My understanding of python is not yet good enough to recognise what python is doing. The desired output below is what I need, any help would be greatly appreciated!
df2 before reindexing:
0 2 3 4 5
0 NaN XS0089553282 XS0089773484 XS0092157600 XS0092541969
1 01-May-14 131.7 165.1 151.8 88.9
3 02-May-14 131 164.9 151.7 88.5
5 05-May-14 131.1 165 151.8 88.6
7 06-May-14 129.9 163.4 151.2 87.1
df2 after reindexing:
0 2 3 4 5
0
NaN NaN NaN NaN NaN NaN
01-May-14 NaN NaN NaN NaN NaN
02-May-14 NaN NaN NaN NaN NaN
05-May-14 NaN NaN NaN NaN NaN
06-May-14 NaN NaN NaN NaN NaN
df2 desired:
XS0089553282 XS0089773484 XS0092157600 XS0092541969
01-May-14 131.7 165.1 151.8 88.9
02-May-14 131 164.9 151.7 88.5
05-May-14 131.1 165 151.8 88.6
06-May-14 129.9 163.4 151.2 87.1
Assign the columns directly:
indcol = df2.ix[:,0]
df2.columns = indcol
The problem with reindex
is it'll use the existing index and column values of your df, so your passed in new column values don't exist, hence why you get all NaN
s
A simpler approach to what you're trying to do:
In [147]:
# take the cols and index values of interest
cols = df.loc[0, '2':]
idx = df['0'].iloc[1:]
print(cols)
print(idx)
2 XS0089553282
3 XS0089773484
4 XS0092157600
5 XS0092541969
Name: 0, dtype: object
1 01-May-14
3 02-May-14
5 05-May-14
7 06-May-14
Name: 0, dtype: object
In [157]:
# drop the first row and the first column
df2 = df.drop('0', axis=1).drop(0)
# overwrite the index values
df2.index = idx.values
df2
Out[157]:
2 3 4 5
01-May-14 131.7 165.1 151.8 88.9
02-May-14 131 164.9 151.7 88.5
05-May-14 131.1 165 151.8 88.6
06-May-14 129.9 163.4 151.2 87.1
In [158]:
# now overwrite the column values
df2.columns = cols.values
df2
Out[158]:
XS0089553282 XS0089773484 XS0092157600 XS0092541969
01-May-14 131.7 165.1 151.8 88.9
02-May-14 131 164.9 151.7 88.5
05-May-14 131.1 165 151.8 88.6
06-May-14 129.9 163.4 151.2 87.1
In [310]:
cols = df.iloc[0 , 1:]
cols
Out[310]:
1 XS0089553282
2 XS0089773484
3 XS0092157600
4 XS0092541969
Name: 0, dtype: object
In [311]:
df.drop(0 , inplace=True)
df
Out[311]:
0 1 2 3 4
1 01-May-14 131.7 165.1 151.8 88.9
2 02-May-14 131 164.9 151.7 88.5
3 05-May-14 131.1 165 151.8 88.6
4 06-May-14 129.9 163.4 151.2 87.1
In [312]:
df.set_index(0 , inplace=True)
df
Out[312]:
0 1 2 3 4
01-May-14 131.7 165.1 151.8 88.9
02-May-14 131 164.9 151.7 88.5
05-May-14 131.1 165 151.8 88.6
06-May-14 129.9 163.4 151.2 87.1
In [315]:
df
df.columns = cols
df
Out[315]:
XS0089553282 XS0089773484 XS0092157600 XS0092541969
01-May-14 131.7 165.1 151.8 88.9
02-May-14 131 164.9 151.7 88.5
05-May-14 131.1 165 151.8 88.6
06-May-14 129.9 163.4 151.2 87.1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With