Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to implode(reverse of pandas explode) based on a column

I have a dataframe df like below

  NETWORK       config_id       APPLICABLE_DAYS  Case    Delivery  
0   Grocery     5399            SUN               10       1        
1   Grocery     5399            MON               20       2       
2   Grocery     5399            TUE               30       3        
3   Grocery     5399            WED               40       4       

I want to implode( combine Applicable_days from multiple rows into single row like below) and get the average case and delivery per config_id

  NETWORK       config_id       APPLICABLE_DAYS      Avg_Cases    Avg_Delivery 
0   Grocery     5399            SUN,MON,TUE,WED         90           10

using the groupby on network,config_id i can get the avg_cases and avg_delivery like below.

df.groupby(['network','config_id']).agg({'case':'mean','delivery':'mean'})

But How do i be able to join APPLICABLE_DAYS while performing this aggregation?

like image 377
krishna koti Avatar asked Oct 06 '20 23:10

krishna koti


People also ask

What is the opposite of pandas explode?

implode` as a opposite function to `df. explode` · Issue #45459 · pandas-dev/pandas · GitHub.

How do you explode a column in pandas?

Pandas DataFrame: explode() functionThe explode() function is used to transform each element of a list-like to a row, replicating the index values. Exploded lists to rows of the subset columns; index will be duplicated for these rows. Raises: ValueError - if columns of the frame are not unique.

How do I explode two columns in a DataFrame?

Column(s) to explode. For multiple columns, specify a non-empty list with each element be str or tuple, and all specified columns their list-like data on same row of the frame must have matching length. If True, the resulting index will be labeled 0, 1, …, n - 1. New in version 1.1.


2 Answers

If you want the "opposite" of explode, then that means bringing it into a list in Solution #1. You can also join as a string in Solution #2:

Use lambda x: x.tolist() for the 'APPLICABLE_DAYS' column within your .agg groupby function:

df = (df.groupby(['NETWORK','config_id'])
      .agg({'APPLICABLE_DAYS': lambda x: x.tolist(),'Case':'mean','Delivery':'mean'})
      .rename({'Case' : 'Avg_Cases','Delivery' : 'Avg_Delivery'},axis=1)
      .reset_index())
df
Out[1]: 
   NETWORK  config_id       APPLICABLE_DAYS  Avg_Cases  Avg_Delivery
0  Grocery       5399  [SUN, MON, TUE, WED]         25           2.5

Use lambda x: ",".join(x) for the 'APPLICABLE_DAYS' column within your .agg groupby function:

 df = (df.groupby(['NETWORK','config_id'])
      .agg({'APPLICABLE_DAYS': lambda x: ",".join(x),'Case':'mean','Delivery':'mean'})
      .rename({'Case' : 'Avg_Cases','Delivery' : 'Avg_Delivery'},axis=1)
      .reset_index())
df
Out[1]: 
   NETWORK  config_id       APPLICABLE_DAYS  Avg_Cases  Avg_Delivery
0  Grocery       5399       SUN,MON,TUE,WED         25           2.5

If you are looking for the sum, then you can just change mean to sum for the Cases and Delivery columns.

like image 155
David Erickson Avatar answered Oct 24 '22 01:10

David Erickson


Your results look more like a sum, than average; The solution below uses named aggregation :

    df.groupby(["NETWORK", "config_id"]).agg(
    APPLICABLE_DAYS=("APPLICABLE_DAYS", ",".join),
    Total_Cases=("Case", "sum"),
    Total_Delivery=("Delivery", "sum"),
)

                        APPLICABLE_DAYS       Total_Cases   Total_Delivery
NETWORK config_id           
Grocery 5399                SUN,MON,TUE,WED           100      10

If it is the mean, then you can change the 'sum' to 'mean' :

df.groupby(["NETWORK", "config_id"]).agg(
    APPLICABLE_DAYS=("APPLICABLE_DAYS", ",".join),
    Avg_Cases=("Case", "mean"),
    Avg_Delivery=("Delivery", "mean"),
)

                    APPLICABLE_DAYS   Avg_Cases Avg_Delivery
NETWORK config_id           
Grocery 5399         SUN,MON,TUE,WED      25      2.5
like image 29
sammywemmy Avatar answered Oct 24 '22 01:10

sammywemmy