def afun(group):
aa=len(group)
group.sort_values(inplace=True)
return pandas.DataFrame({'score':numpy.arange(aa),'price':group})
df = pandas.DataFrame({
'stock':numpy.repeat( ['AAPL','GOOG','YHOO'], 3 ),
'date':numpy.tile( pandas.date_range('5/5/2015', periods=3, freq='D'), 3 ),
'price':(numpy.random.randn(9).cumsum() + 10) ,
'price2':(numpy.random.randn(9).cumsum() + 10)})
df = df.set_index(['stock','date'])
agroupDf=df.groupby(level='date')
tt=agroupDf['price'].apply(afun)
the value of variable tt is shown in the figure
my question is why tt has two columns 'date',and how to avoid the secend column 'date'?

The first one is the groupby 'date'. The second one is the index 'date'.
changing things around - this time groupby stock:
df = df.set_index(['date','stock'])
agroupDf = df.groupby(level='stock')
tt=agroupDf['price'].apply(afun)
tt
price score
stock date stock
AAPL 2015-05-05 AAPL 9.333143 0
2015-05-06 AAPL 9.680022 1
2015-05-07 AAPL 9.870889 2
GOOG 2015-05-06 GOOG 10.030032 0
2015-05-05 GOOG 10.229084 1
2015-05-07 GOOG 10.571631 2
YHOO 2015-05-07 YHOO 9.996925 0
2015-05-05 YHOO 10.342180 1
2015-05-06 YHOO 10.586120 2
I think you want this:
df = df.set_index('stock')
agroupDf = df.groupby('date')
tt=agroupDf['price'].apply(afun)
tt
price score
date stock
2015-05-05 AAPL 10.414396 0
GOOG 12.608225 1
YHOO 12.830496 2
2015-05-06 AAPL 10.428767 0
GOOG 11.189663 1
YHOO 11.988177 2
2015-05-07 YHOO 11.202677 0
AAPL 11.274440 1
GOOG 11.780654 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With