Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas groupby changes return type when reassembling groups

I have a dataframe:

df = pd.DataFrame({'c':[0,1,1,2,2,2],   'date':pd.to_datetime(['2016-01-01','2016-02-01','2016-03-01','2016-04-01','2016-05-01','2016-06-01'])})

For each row, I would like to get a number = the month number for each date (Jan=1,Feb=2 etc.) + length of that group (1st group having 1 member, 2nd group having 2 etc.):

So it should return something like:

c       date   num
0 2016-01-01    2
1 2016-02-01    4
1 2016-03-01    5
2 2016-04-01    7
2 2016-05-01    8
2 2016-06-01    9

I created a function:

def testlambda(x):
    print(x)
    return x.dt.month.astype('int') + len(x)

And used groupby + transform:

df['num'] = df.groupby(['c'])['date'].transform(lambda x: testlambda(x))

But the new column returned is still in date format, even if my lambda returns int.

What to do here?

like image 266
iwbabn Avatar asked Mar 10 '23 05:03

iwbabn


1 Answers

Try to use DataFrameGroupBy.transform() instead of SeriesGroupBy.transform() as the latter one tries to cast the result to the source dtype:

In [131]: def testlambda(x):
     ...:     #print(x)
     ...:     return x.dt.month.astype('int') + len(x)
     ...:

In [132]: df
Out[132]:
   c       date
0  0 2016-01-01
1  1 2016-02-01
2  1 2016-03-01
3  2 2016-04-01
4  2 2016-05-01
5  2 2016-06-01

#                                      v        v - thats's the only difference    
In [133]: df['num'] = df.groupby(['c'])[['date']].transform(lambda x: testlambda(x))

In [134]: df
Out[134]:
   c       date  num
0  0 2016-01-01    2
1  1 2016-02-01    4
2  1 2016-03-01    5
3  2 2016-04-01    7
4  2 2016-05-01    8
5  2 2016-06-01    9
like image 126
MaxU - stop WAR against UA Avatar answered Apr 25 '23 19:04

MaxU - stop WAR against UA