I have a DataFrame which contains order data, specified per row. So each row is a different order.
A customer is a recurring customer when they have ordered for the third time. I want to find out the percentage to which returning customers contribute to the total value.
The DataFrame looks like this:
df = pd.DataFrame(
{
"date_created" ["2019-11-16", "2019-11-16", "2019-11-16", "2019-11-16", "2019-11-16", "2019-11-16"]
"customer_id": ["1733", "6356", "6457", "6599", "6637", "6638"],
"total": ["746.02", "1236.60", "1002.32", "1187.21", "1745.03", "2313.14"],
"recurring_customer": ["False", "False", "False", "False", "False", "False"],
}
)
By resampling the data to monthly data:
df_monthly = df.resample('1M').mean()
I got the following output:
df_monthly = pd.DataFrame(
{
"date_created": ["2019-11-30", "2019-12-31", "2020-01-31", "2020-02-29", "2020-03-31", "2020-04-30"]
"customer_id": ["4987.02", "5291.56", "5702.13", "6439.27", "7263.11", "8080.91",],
"total": ["2915.25", "2550.85", "2486.72", "2515.81", "2633.77", "2558.19"],
"recurring_customer": ["0.009050", "0.016667", "0.075630", "0.138122", "0.130045", "0.175503"],
}
)
So, the real question is that I want to find out the percentage to which returning customers contribute to the total value of the month.
The desired output should look something like this:
| date_created | customer_id | total | recurring_customer | recurring_customer_total | recurring_customer_total_percentage |
| ------------ | ----------- | ------ | ------------------ | ------------------------ | ----------------------------------- |
| 2019-11-30 | 4987.02 | 2915.25 | 0.009050 | ?????? | ??????
| 2019-12-31 | 5291.56 | 2550.85 | 0.016667 | ?????? | ??????
| 2020-01-31 | 5702.13 | 2486.72 | 0.075630 | ?????? | ??????
| 2020-02-29 | 6439.27 | 2515.81 | 0.138122 | ?????? | ??????
| 2020-03-31 | 7263.11 | 2633.77 | 0.130045 | ?????? | ??????
| 2020-04-30 | 8080.91 | 2558.19 | 0.175503 | ?????? | ??????
Note that I can't just calculate the recurring_customer percentages times the total value because I assume the group of recurring customers contribute a lot more to the total value than customers who aren't a recurring customer.
I tried the np.where() function on the daily dataframe, where :
I think those are the steps I need to follow, the only problem is that I don't really know how to get there.
Thanks in advance!
So I'm fairly new to Python but I've managed to answer my own question. Can't say this is the best, easiest, fastest way but it surely helped.
First of all I made a new dataframe which is an exact copy of the original dataframe, but only with 'True' values of the column 'recurring_customer'. I did that by using the following code:
df_recurring_customers = df.loc[df['recurring_customer'] == True]
It gave me the following dataframe:
df_recurring_customers.head()
{
"date_created" ["2019-11-25", "2019-11-28", "2019-12-02", "2019-12-09", "2019-12-11"]
"customer_id": ["577", "6457", "577", "6647", "840"],
"total": ["33891.12", "81.98", "9937.68", "1166.28", "2969.60"],
"recurring_customer": ["True", "True", "True", "True", "True"],
}
)
Then I resampled the values using:
df_recurring_customers_monthly_sum = df_recurring_customers.resample('1M').sum()
I then dropped the 'number' and 'customer_id' column, which had no value. The next step was to join the two dataframes 'df_monthly' and 'df_recurring_customers_monthly_sum' using:
df_total = df_recurring_customers_monthly_sum.join(df_monthly)
This gave me:
| date_created | total | recurring_customer_total |
| ------------ | ---------- | ------------------------ |
| 2019-11-30 | 644272.02 | 33973.10 |
| 2019-12-31 | 612205.99 | 15775.29 |
| 2020-01-31 | 887761.60 | 61612.27 |
| 2020-02-29 | 910724.75 | 125315.31 |
| 2020-03-31 | 1174662.59 | 125315.31 |
| 2020-04-30 | 1399332.26 | 248277.97 |
Then I wanted to know the percentage so
df_total['total_recurring_customer_percentage'] = (df_total['recurring_customer_total'] / df_total['total']) * 100
Which gave me:
| date_created | total | recurring_customer_total | recurring_customer_total_percentage |
| ------------ | ---------- | ------------------------ | ----------------------------------- |
| 2019-11-30 | 644272.02 | 33973.10 | 5.273099
| 2019-12-31 | 612205.99 | 15775.29 | 2.576794
| 2020-01-31 | 887761.60 | 61612.27 | 6.940182
| 2020-02-29 | 910724.75 | 125315.31 | 13.759954
| 2020-03-31 | 1174662.59 | 125315.31 | 13.967221
| 2020-04-30 | 1399332.26 | 248277.97 | 17.742603
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With