I want to add a column that is a result of subtraction of min date from max date for each customer_id to this table
Input:
action_date customer_id
2017-08-15 1
2017-08-21 1
2017-08-21 1
2017-09-02 1
2017-08-28 2
2017-09-29 2
2017-10-15 3
2017-10-30 3
2017-12-05 3
And get this table
Output:
action_date customer_id diff
2017-08-15 1 18
2017-08-21 1 18
2017-08-21 1 18
2017-09-02 1 18
2017-08-28 2 32
2017-09-29 2 32
2017-10-15 3 51
2017-10-30 3 51
2017-12-05 3 51
I tried this code, but it puts lots of NaN's
group = df.groupby(by='customer_id')
df['diff'] = (group['action_date'].max() - group['action_date'].min()).dt.days
first, calculate the difference between the two dates. second, convert the difference in the metric you want to use… 'D' for day, 'W' for weeks, 'M' for month, 'Y' for year.
We will take a dataframe and have two columns for the dates between which we want to get the difference. Use df. dates1-df. dates2 to find the difference between the two dates and then convert the result in the form of months.
subtract() function is used for finding the subtraction of dataframe and other, element-wise. This function is essentially same as doing dataframe – other but with a support to substitute for missing data in one of the inputs.
you can use transform
method:
In [23]: df['diff'] = df.groupby('customer_id') \
['action_date'] \
.transform(lambda x: (x.max()-x.min()).days)
In [24]: df
Out[24]:
action_date customer_id diff
0 2017-08-15 1 18
1 2017-08-21 1 18
2 2017-08-21 1 18
3 2017-09-02 1 18
4 2017-08-28 2 32
5 2017-09-29 2 32
6 2017-10-15 3 51
7 2017-10-30 3 51
8 2017-12-05 3 51
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With