I am working on python (pandas
specifically) to analyze a dataset. (Python is too awesome, the power of open source is amazing). I am having trouble with a specific part of my dataset.
I have the following data set,
time,contract,ticker,expiry,strike,quote,price,volume
08:01:08,C,PXA,20100101,4000,A,57.8,60
08:01:11,C,PXA,20100101,4000,A,58.4,60
08:01:12,C,PXA,20100101,4000,A,58,60
08:01:16,C,PXA,20100101,4000,A,58.4,60
08:01:16,C,PXA,20100101,4000,A,58,60
08:01:21,C,PXA,20100101,4000,A,58.4,60
08:01:21,C,PXA,20100101,4000,A,58,60
and it goes on ...
I am using pandas to load the data. After this, I would like to be able to do the following, take a volume weighted average of the time there are duplicates.
i.e. since there are two asks at time 08:01:16, I would like to calculate the average price based on volume which would be (58.4*60 + 58*60)/(60+60) and an average of the volume on the volume column which would be (60+60)/2.
In [28]: a = pd.read_csv('aa.csv')
In [29]: a
Out[29]:
time contract ticker expiry strike quote price volume
0 08:01:08 C PXA 20100101 4000 A 57.8 60
1 08:01:11 C PXA 20100101 4000 A 58.4 60
2 08:01:12 C PXA 20100101 4000 A 58.0 60
3 08:01:16 C PXA 20100101 4000 A 58.4 60
4 08:01:16 C PXA 20100101 4000 A 58.0 60
5 08:01:21 C PXA 20100101 4000 A 58.4 60
6 08:01:21 C PXA 20100101 4000 A 58.0 60
In [30]: pd.DataFrame([{'time': k,
'price': (v.price * v.volume).sum() / v.volume.sum(),
'volume': v.volume.mean()}
for k,v in a.groupby(['time'])],
columns=['time', 'price', 'volume'])
Out[30]:
time price volume
0 08:01:08 57.8 60
1 08:01:11 58.4 60
2 08:01:12 58.0 60
3 08:01:16 58.2 60
4 08:01:21 58.2 60
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With