I have a dataframe:
df = pd.DataFrame({'c':[0,1,1,2,2,2], 'date':pd.to_datetime(['2016-01-01','2016-02-01','2016-03-01','2016-04-01','2016-05-01','2016-06-01'])})
For each row, I would like to get a number = the month number for each date (Jan=1,Feb=2 etc.) + length of that group (1st group having 1 member, 2nd group having 2 etc.):
So it should return something like:
c date num
0 2016-01-01 2
1 2016-02-01 4
1 2016-03-01 5
2 2016-04-01 7
2 2016-05-01 8
2 2016-06-01 9
I created a function:
def testlambda(x):
print(x)
return x.dt.month.astype('int') + len(x)
And used groupby + transform:
df['num'] = df.groupby(['c'])['date'].transform(lambda x: testlambda(x))
But the new column returned is still in date format, even if my lambda returns int.
What to do here?
Try to use DataFrameGroupBy.transform()
instead of SeriesGroupBy.transform()
as the latter one tries to cast the result to the source dtype:
In [131]: def testlambda(x):
...: #print(x)
...: return x.dt.month.astype('int') + len(x)
...:
In [132]: df
Out[132]:
c date
0 0 2016-01-01
1 1 2016-02-01
2 1 2016-03-01
3 2 2016-04-01
4 2 2016-05-01
5 2 2016-06-01
# v v - thats's the only difference
In [133]: df['num'] = df.groupby(['c'])[['date']].transform(lambda x: testlambda(x))
In [134]: df
Out[134]:
c date num
0 0 2016-01-01 2
1 1 2016-02-01 4
2 1 2016-03-01 5
3 2 2016-04-01 7
4 2 2016-05-01 8
5 2 2016-06-01 9
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With