I have a df
:
date category subcategory order_id product_id branch
2021-05-04 A aa 10 5 web
2021-06-04 A dd 10 2 web
2021-05-06 B aa 18 3 shop
2021-07-06 A aa 50 10 web
2021-07-06 C cc 10 15 web
2021-07-05 A ff 101 30 shop
2021-10-04 D aa 100 15 shop
I am trying to answer a question which items categories and subcategories are bought together per branch type weekly. I am thinking of grouping the order_ids
and aggregating the category & subcategory
to a list
like so:
a = (df.set_index('date')
.groupby(['order_id','branch'])
.resample('W-MON', label = 'left')
.agg({'category':list, 'subcategory':list}))
Which returns :
category subcategory
order_id branch date [A, A, A] [aa, dd, aa]
10 web 2021-05-04 ... ...
18 shop ...
50 web
100 web
101 shop
I am trying to build a structure which would show the frequency of each variation of the categories
and subcategories
bought each week per branch
, something similar to this:
branch date
2021-05-04 2021-05-011
...
web category 3, [A, A, A]
2, [A, A]
2, [A, A, B, B]
subcategory 5, [aa, dd, aa]
4, [dd, aa]
1, [dd]
shop category 3, [A, A, A]
2, [A, A]
2, [A, A, B, B]
subcategory 5, [aa, dd, aa]
4, [dd, aa]
1, [dd]
Where the number before the list denotes the number of times a certain combinations of categories
and subcategories
were bought in the same order. I am unsure how to achieve such a structure or a similar one that would show the weekly combination frequencies by branch
. The order of the product_id
in the order does not matter as the final basket is the same.
So the goal is to see the frequency of categories, subcategories & product_ids
bought in the same order weekly. So if 2 different orders have the same products, the aggregated result would show 2, [A,B] [aa, bb] [5, 2]
where the lists hold category, subcategory & product_id combinations.
In this article, we will GroupBy two columns and count the occurrences of each combination in Pandas. DataFrame.groupby () method is used to separate the DataFrame into groups. It will generate the number of similar data counts present in a particular column of the data frame. by: mapping, function, string, label, or iterable to group elements.
Similar to the SQL GROUP BY statement, the Pandas method works by splitting our data, aggregating it in a given way (or ways), and re-combining the data in a meaningful way. Because the .groupby () method works by first splitting the data, we can actually work with the groups directly.
Pandas groupby () and using agg (‘count’) Alternatively, you can also get the group count by using agg () or aggregate () function and passing the aggregate count function as a param. reset_index () function is used to set the index on DataFrame. By using this approach you can compute multiple aggregations. Yields below output. 7.
One term that’s frequently used alongside .groupby () is split-apply-combine. This refers to a chain of three steps: It can be difficult to inspect df.groupby ("state") because it does virtually none of these things until you do something with the resulting object. Again, a Pandas GroupBy object is lazy.
This is what you need:
import pandas as pd
import numpy as np
from datetime import timedelta
from datetime import datetime as dt
# df=pd.read_excel('demo.xlsx')
df['date']=pd.to_datetime(df['date'])
df['date']=df['date'].dt.strftime('%Y-%m-%d')
df['date']=pd.to_datetime(df['date'])
df['year_week'] = df['date'].dt.strftime('%Y_%U')
df['orderid_year_week']=df['order_id'].astype(str)+'_'+df['year_week']
df=df.sort_values(['category', 'subcategory','product_id'], ascending=[True, True,True])
a = (df.set_index('orderid_year_week')
.groupby(['year_week','order_id'],sort=False)
.agg({'category':list, 'subcategory':list,'product_id':list})).reset_index()
a['category'] =a['category'].astype(str)
a['subcategory'] =a['subcategory'].astype(str)
a['product_id'] =a['product_id'].astype(str)
df=pd.pivot_table(a,index=['year_week','category','subcategory','product_id'],values='product_id',aggfunc='count').reset_index()
df.rename({'order_id':'count'},axis=1,inplace=True)
The output looks like this (I have added a few more entries on top of the sample that you provided):
Some things in your explanations are not crystal clear. But let me know if this fully answers your question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With