I am trying to add column headers to csv file that I have parsed into a dataframe withing Pandas.
dfTrades = pd.read_csv('pnl1.txt',delim_whitespace=True,header=None,);
dfTrades = dfTrades.drop(dfTrades.columns[[3,4,6,8,10,11,13,15,17,18,25,27,29,32]], axis=1) # Note: zero indexed
dfTrades = dfTrades.set_index([dfTrades.index]);
df = pd.DataFrame(dfTrades,columns=['TradeDate',
'TradeTime',
'CumPnL',
'DailyCumPnL',
'RealisedPnL',
'UnRealisedPnL',
'CCYCCY',
'CCYCCYPnLDaily',
'Position',
'CandleOpen',
'CandleHigh',
'CandleLow',
'CandleClose',
'CandleDir',
'CandleDirSwings',
'TradeAmount',
'Rate',
'PnL/Trade',
'Venue',
'OrderType',
'OrderID'
'Code']);
print df
The structure of the data is:
01/10/2015 05:47.3 190 190 -648 838 EURNOK -648 0 0 611 -1137 -648 H 2 -1000000 9.465 -648 INTERNAL IOC 287 AS
What Pandas returns is:
TradeDate TradeTime CumPnL DailyCumPnL RealisedPnL UnRealisedPnL \
0 NaN NaN NaN NaN NaN NaN ...
I would appreciate any advice on the issue.
Thanks
Ps. Thanks to Ed for his answer. I have tried your suggestion with
df = dfTrades.columns=['TradeDate',
'TradeTime',
'CumPnL',
'DailyCumPnL',
'RealisedPnL',
'UnRealisedPnL',
'CCYCCY',
'CCYCCYPnLDaily',
'Position',
'CandleOpen',
'CandleHigh',
'CandleLow',
'CandleClose',
'CandleDir',
'CandleDirSwings',
'TradeAmount',
'Rate',
'PnL/Trade',
'Venue',
'OrderType',
'OrderID'
'Code'];
But now the problem has morphed to:
ValueError: Length mismatch: Expected axis has 22 elements, new values have 21 elements
I have taken the shape of the matrix and got: dfTrades.shape
(12056, 22)
So sadly i still need some help :(
Assign directly to the columns:
df.columns = ['TradeDate',
'TradeTime',
'CumPnL',
'DailyCumPnL',
'RealisedPnL',
'UnRealisedPnL',
'CCYCCY',
'CCYCCYPnLDaily',
'Position',
'CandleOpen',
'CandleHigh',
'CandleLow',
'CandleClose',
'CandleDir',
'CandleDirSwings',
'TradeAmount',
'Rate',
'PnL/Trade',
'Venue',
'OrderType',
'OrderID'
'Code']
What you're doing is reindexing and because the columns don't agree get all NaNs as you're passing the df as the data it will align on existing column names and index values.
You can see the same semantic behaviour here:
In [240]:
df = pd.DataFrame(data= np.random.randn(5,3), columns = np.arange(3))
df
Out[240]:
0 1 2
0 1.037216 0.761995 0.153047
1 -0.602141 -0.114032 -0.323872
2 -1.188986 0.594895 -0.733236
3 0.556196 0.363965 -0.893846
4 0.547791 -0.378287 -1.171706
In [242]:
df1 = pd.DataFrame(df, columns = list('abc'))
df1
Out[242]:
a b c
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
Alternatively you can pass the np array as the data:
df = pd.DataFrame(dfTrades.values,columns=['TradeDate',
In [244]:
df1 = pd.DataFrame(df.values, columns = list('abc'))
df1
Out[244]:
a b c
0 1.037216 0.761995 0.153047
1 -0.602141 -0.114032 -0.323872
2 -1.188986 0.594895 -0.733236
3 0.556196 0.363965 -0.893846
4 0.547791 -0.378287 -1.171706
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With