Working with the following python pandas dataframe df:
Customer_ID | Transaction_ID
ABC 2016-05-06-1234
ABC 2017-06-08-3456
ABC 2017-07-12-5678
ABC 2017-12-20-6789
BCD 2016-08-23-7891
BCD 2016-09-21-2345
BCD 2017-10-23-4567
The date is unfortunately hidden in the transaction_id string. I edited the dataframe this way.
#year of transaction
df['year'] = df['Transaction_ID'].astype(str).str[:4]
#date of transaction
df['date'] = df['Transaction_ID'].astype(str).str[:10]
#format date
df['date']=pd.to_datetime(df['date'], format='%Y-%m-%d')
#calculate visit number per year
df['visit_nr_yr'] = df.groupby(['Customer_ID', 'year']).cumcount()+1
Now the df looks like this:
Customer_ID | Transaction_ID | year | date |visit_nr_yr
ABC 2016-05-06-1234 2016 2016-05-06 1
ABC 2017-06-08-3456 2017 2017-06-08 1
ABC 2017-07-12-5678 2017 2017-07-12 2
ABC 2017-12-20-6789 2017 2017-12-20 3
BCD 2016-08-23-7891 2016 2016-08-23 1
BCD 2016-09-21-2345 2016 2016-09-21 2
BCD 2017-10-23-4567 2017 2017-10-23 1
I need to calculate the following:
First I would like to include the following column "days_between_visits_by year" (math to be done by Customer_ID):
Customer_ID|Transaction_ID |year| date |visit_nr_yr|days_bw_visits_yr
ABC 2016-05-06-1234 2016 2016-05-06 1 NaN
ABC 2017-06-08-3456 2017 2017-06-08 1 NaN
ABC 2017-07-12-5678 2017 2017-07-12 2 34
ABC 2017-12-20-6789 2017 2017-12-20 3 161
BCD 2016-08-23-7891 2016 2016-08-23 1 NaN
BCD 2016-09-21-2345 2016 2016-09-21 2 29
BCD 2017-10-23-4567 2017 2017-10-23 1 NaN
Please note that I avoided 0s on purpose and kept the Nans, in case somebody had two visits on the same day.
Next I want to calculate the average days between visits by visit (so between 1&2 and between 2&3 within a year). Looking for this output:
avg_days_bw_visits_1_2 | avg_days_bw_visits_2_3
31.5 161
Finally, I want to calculate the average days between visits in general:
output: 203.8
#the days between visits are 398,34,161,29,397 and the average of those
numbers is 203.8
I'm stuck with at the how to create the column "days_bw_visits_yr". Nans have to be excluded from the math.
datetime() module Python has a built-in datetime module that assists us in resolving a number of date-related issues. We just input the two dates with the date type and subtract them to discover the difference between the two dates, which gives us the number of days between the two dates.
Use df. dates1-df. dates2 to find the difference between the two dates and then convert the result in the form of months.
To get column average or mean from pandas DataFrame use either mean() and describe() method. The DataFrame. mean() method is used to return the mean of the values for the requested axis.
In order to select rows between two dates in pandas DataFrame, first, create a boolean mask using mask = (df['InsertedDates'] > start_date) & (df['InsertedDates'] <= end_date) to represent the start and end of the date range. Then you select the DataFrame that lies within the range using the DataFrame.
You can get previous visit date (grouped by customer and year) by shifting the "date" column down by 1:
df['previous_visit'] = df.groupby(['Customer_ID', 'year'])['date'].shift()
From this, days between visits is simply the difference:
df['days_bw_visits'] = df['date'] - df['previous_visit']
To calculate mean, convert the date delta object to days:
df['days_bw_visits'] = df['days_bw_visits'].apply(lambda x: x.days)
Average days between visits:
df.groupby('visit_nr_yr')['days_bw_visits'].agg('mean')
df['days_bw_visits'].mean()
Source DF:
In [96]: df
Out[96]:
Customer_ID Transaction_ID
0 ABC 2016-05-06-1234
1 ABC 2017-06-08-3456
2 ABC 2017-07-12-5678
3 ABC 2017-12-20-6789
4 BCD 2016-08-23-7891
5 BCD 2016-09-21-2345
6 BCD 2017-10-23-4567
Solution:
df['Date'] = pd.to_datetime(df.Transaction_ID.str[:10])
df['visit_nr_yr'] = df.groupby(['Customer_ID', df['Date'].dt.year]).cumcount()+1
df['days_bw_visits_yr'] = \
df.groupby(['Customer_ID', df['Date'].dt.year])['Date'].diff().dt.days
Result:
In [98]: df
Out[98]:
Customer_ID Transaction_ID Date visit_nr_yr days_bw_visits_yr
0 ABC 2016-05-06-1234 2016-05-06 1 NaN
1 ABC 2017-06-08-3456 2017-06-08 1 NaN
2 ABC 2017-07-12-5678 2017-07-12 2 34.0
3 ABC 2017-12-20-6789 2017-12-20 3 161.0
4 BCD 2016-08-23-7891 2016-08-23 1 NaN
5 BCD 2016-09-21-2345 2016-09-21 2 29.0
6 BCD 2017-10-23-4567 2017-10-23 1 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With