I have a dataframe df like below
NETWORK config_id APPLICABLE_DAYS Case Delivery
0 Grocery 5399 SUN 10 1
1 Grocery 5399 MON 20 2
2 Grocery 5399 TUE 30 3
3 Grocery 5399 WED 40 4
I want to implode( combine Applicable_days from multiple rows into single row like below) and get the average case and delivery per config_id
NETWORK config_id APPLICABLE_DAYS Avg_Cases Avg_Delivery
0 Grocery 5399 SUN,MON,TUE,WED 90 10
using the groupby on network,config_id i can get the avg_cases and avg_delivery like below.
df.groupby(['network','config_id']).agg({'case':'mean','delivery':'mean'})
But How do i be able to join APPLICABLE_DAYS while performing this aggregation?
implode` as a opposite function to `df. explode` · Issue #45459 · pandas-dev/pandas · GitHub.
Pandas DataFrame: explode() functionThe explode() function is used to transform each element of a list-like to a row, replicating the index values. Exploded lists to rows of the subset columns; index will be duplicated for these rows. Raises: ValueError - if columns of the frame are not unique.
Column(s) to explode. For multiple columns, specify a non-empty list with each element be str or tuple, and all specified columns their list-like data on same row of the frame must have matching length. If True, the resulting index will be labeled 0, 1, …, n - 1. New in version 1.1.
If you want the "opposite" of explode, then that means bringing it into a list in Solution #1. You can also join as a string in Solution #2:
Use lambda x: x.tolist() for the 'APPLICABLE_DAYS' column within your .agg groupby function:
df = (df.groupby(['NETWORK','config_id'])
.agg({'APPLICABLE_DAYS': lambda x: x.tolist(),'Case':'mean','Delivery':'mean'})
.rename({'Case' : 'Avg_Cases','Delivery' : 'Avg_Delivery'},axis=1)
.reset_index())
df
Out[1]:
NETWORK config_id APPLICABLE_DAYS Avg_Cases Avg_Delivery
0 Grocery 5399 [SUN, MON, TUE, WED] 25 2.5
Use lambda x: ",".join(x) for the 'APPLICABLE_DAYS' column within your .agg groupby function:
df = (df.groupby(['NETWORK','config_id'])
.agg({'APPLICABLE_DAYS': lambda x: ",".join(x),'Case':'mean','Delivery':'mean'})
.rename({'Case' : 'Avg_Cases','Delivery' : 'Avg_Delivery'},axis=1)
.reset_index())
df
Out[1]:
NETWORK config_id APPLICABLE_DAYS Avg_Cases Avg_Delivery
0 Grocery 5399 SUN,MON,TUE,WED 25 2.5
If you are looking for the sum, then you can just change mean to sum for the Cases and Delivery columns.
Your results look more like a sum, than average; The solution below uses named aggregation :
df.groupby(["NETWORK", "config_id"]).agg(
APPLICABLE_DAYS=("APPLICABLE_DAYS", ",".join),
Total_Cases=("Case", "sum"),
Total_Delivery=("Delivery", "sum"),
)
APPLICABLE_DAYS Total_Cases Total_Delivery
NETWORK config_id
Grocery 5399 SUN,MON,TUE,WED 100 10
If it is the mean, then you can change the 'sum' to 'mean' :
df.groupby(["NETWORK", "config_id"]).agg(
APPLICABLE_DAYS=("APPLICABLE_DAYS", ",".join),
Avg_Cases=("Case", "mean"),
Avg_Delivery=("Delivery", "mean"),
)
APPLICABLE_DAYS Avg_Cases Avg_Delivery
NETWORK config_id
Grocery 5399 SUN,MON,TUE,WED 25 2.5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With