I have a dataframe df like below <pre class="prettyprint"><code> NETWORK config_id APPLICABLE_DAYS Case Delivery 0 Grocery 5399 SUN 10 1 1 Grocery 5399 MON 20 2 2 Grocery 5399 TUE 30 3 3 Grocery 5399 WED 40 4 </code></pre> I want to implode( combine Applicable_days from multiple rows into single row like below) and get the average case and delivery per config_id <pre class="prettyprint"><code> NETWORK config_id APPLICABLE_DAYS Avg_Cases Avg_Delivery 0 Grocery 5399 SUN,MON,TUE,WED 90 10 </code></pre> using the groupby on network,config_id i can get the avg_cases and avg_delivery like below. <pre class="prettyprint"><code>df.groupby(['network','config_id']).agg({'case':'mean','delivery':'mean'}) </code></pre> But How do i be able to join APPLICABLE_DAYS while performing this aggregation?

If you want the "opposite" of explode, then that means bringing it into a list in Solution #1. You can also join as a string in Solution #2: Use <code>lambda x: x.tolist()</code> for the <code>'APPLICABLE_DAYS'</code> column within your <code>.agg</code> groupby function: <pre class="prettyprint"><code>df = (df.groupby(['NETWORK','config_id']) .agg({'APPLICABLE_DAYS': lambda x: x.tolist(),'Case':'mean','Delivery':'mean'}) .rename({'Case' : 'Avg_Cases','Delivery' : 'Avg_Delivery'},axis=1) .reset_index()) df Out[1]: NETWORK config_id APPLICABLE_DAYS Avg_Cases Avg_Delivery 0 Grocery 5399 [SUN, MON, TUE, WED] 25 2.5 </code></pre> <hr> Use <code>lambda x: ",".join(x)</code> for the <code>'APPLICABLE_DAYS'</code> column within your <code>.agg</code> groupby function: <pre class="prettyprint"><code> df = (df.groupby(['NETWORK','config_id']) .agg({'APPLICABLE_DAYS': lambda x: ",".join(x),'Case':'mean','Delivery':'mean'}) .rename({'Case' : 'Avg_Cases','Delivery' : 'Avg_Delivery'},axis=1) .reset_index()) df Out[1]: NETWORK config_id APPLICABLE_DAYS Avg_Cases Avg_Delivery 0 Grocery 5399 SUN,MON,TUE,WED 25 2.5 </code></pre> If you are looking for the <code>sum</code>, then you can just change <code>mean</code> to <code>sum</code> for the <code>Cases</code> and <code>Delivery</code> columns.

Your results look more like a sum, than average; The solution below uses named aggregation : <pre class="prettyprint"><code> df.groupby(["NETWORK", "config_id"]).agg( APPLICABLE_DAYS=("APPLICABLE_DAYS", ",".join), Total_Cases=("Case", "sum"), Total_Delivery=("Delivery", "sum"), ) APPLICABLE_DAYS Total_Cases Total_Delivery NETWORK config_id Grocery 5399 SUN,MON,TUE,WED 100 10 </code></pre> If it is the mean, then you can change the 'sum' to 'mean' : <pre class="prettyprint"><code>df.groupby(["NETWORK", "config_id"]).agg( APPLICABLE_DAYS=("APPLICABLE_DAYS", ",".join), Avg_Cases=("Case", "mean"), Avg_Delivery=("Delivery", "mean"), ) APPLICABLE_DAYS Avg_Cases Avg_Delivery NETWORK config_id Grocery 5399 SUN,MON,TUE,WED 25 2.5 </code></pre>

How to implode(reverse of pandas explode) based on a column

Tags:

pandas

numpy

explode

implode

I have a dataframe df like below

  NETWORK       config_id       APPLICABLE_DAYS  Case    Delivery  
0   Grocery     5399            SUN               10       1        
1   Grocery     5399            MON               20       2       
2   Grocery     5399            TUE               30       3        
3   Grocery     5399            WED               40       4

I want to implode( combine Applicable_days from multiple rows into single row like below) and get the average case and delivery per config_id

  NETWORK       config_id       APPLICABLE_DAYS      Avg_Cases    Avg_Delivery 
0   Grocery     5399            SUN,MON,TUE,WED         90           10

using the groupby on network,config_id i can get the avg_cases and avg_delivery like below.

df.groupby(['network','config_id']).agg({'case':'mean','delivery':'mean'})

But How do i be able to join APPLICABLE_DAYS while performing this aggregation?

377

asked Oct 06 '20 23:10

krishna koti

2 Answers

If you want the "opposite" of explode, then that means bringing it into a list in Solution #1. You can also join as a string in Solution #2:

Use lambda x: x.tolist() for the 'APPLICABLE_DAYS' column within your .agg groupby function:

df = (df.groupby(['NETWORK','config_id'])
      .agg({'APPLICABLE_DAYS': lambda x: x.tolist(),'Case':'mean','Delivery':'mean'})
      .rename({'Case' : 'Avg_Cases','Delivery' : 'Avg_Delivery'},axis=1)
      .reset_index())
df
Out[1]: 
   NETWORK  config_id       APPLICABLE_DAYS  Avg_Cases  Avg_Delivery
0  Grocery       5399  [SUN, MON, TUE, WED]         25           2.5

Use lambda x: ",".join(x) for the 'APPLICABLE_DAYS' column within your .agg groupby function:

 df = (df.groupby(['NETWORK','config_id'])
      .agg({'APPLICABLE_DAYS': lambda x: ",".join(x),'Case':'mean','Delivery':'mean'})
      .rename({'Case' : 'Avg_Cases','Delivery' : 'Avg_Delivery'},axis=1)
      .reset_index())
df
Out[1]: 
   NETWORK  config_id       APPLICABLE_DAYS  Avg_Cases  Avg_Delivery
0  Grocery       5399       SUN,MON,TUE,WED         25           2.5

If you are looking for the sum, then you can just change mean to sum for the Cases and Delivery columns.

155

answered Oct 24 '22 01:10

David Erickson

Your results look more like a sum, than average; The solution below uses named aggregation :

    df.groupby(["NETWORK", "config_id"]).agg(
    APPLICABLE_DAYS=("APPLICABLE_DAYS", ",".join),
    Total_Cases=("Case", "sum"),
    Total_Delivery=("Delivery", "sum"),
)

                        APPLICABLE_DAYS       Total_Cases   Total_Delivery
NETWORK config_id           
Grocery 5399                SUN,MON,TUE,WED           100      10

If it is the mean, then you can change the 'sum' to 'mean' :

df.groupby(["NETWORK", "config_id"]).agg(
    APPLICABLE_DAYS=("APPLICABLE_DAYS", ",".join),
    Avg_Cases=("Case", "mean"),
    Avg_Delivery=("Delivery", "mean"),
)

                    APPLICABLE_DAYS   Avg_Cases Avg_Delivery
NETWORK config_id           
Grocery 5399         SUN,MON,TUE,WED      25      2.5

answered Oct 24 '22 01:10

sammywemmy

Related questions
                            
                                why is a sum of strings converted to floats
                            
                                Merge two MultiIndex levels into one in Pandas
                            
                                Check if a row exists in pandas
                            
                                Finding the index of the first element (e.g "True") from a series/column
                            
                                Pandas - transpose one column
                            
                                How to change only the maximum value of a group in pandas dataframe
                            
                                Efficiently construct Pandas DataFrame from large list of tuples/rows
                            
                                Pandas Drop Rows Outside of Time Range
                            
                                Creating percentile buckets in pandas
                            
                                Selecting columns with condition on Pandas DataFrame
                            
                                What is the difference between sklearn LabelEncoder and pd.get_dummies?
                            
                                Python - Pandas - Difference between timestamps and period range
                            
                                Can't convert dates to datetime64
                            
                                ValueError when using pandas.read_json
                            
                                pandas: How to work with _iLocIndexer?
                            
                                Analysing Time Series in Python - pandas formatting error - statsmodels
                            
                                How to return a string from pandas.DataFrame.info()
                            
                                filter pandas dataframe for past x days
                            
                                How to convert pandas dataframe to nested dictionary
                            
                                Plotting using Pandas and datetime format

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With