Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I get non-aggregated columns using groupby in Pandas? [closed]

I have a sample data frame like this:

Id  application is_a  is_b  is_c   reason  subid  record
100 app_1       False False False  test1   4      record100
100 app_2       True  False False  test2   3      record100
100 app_3       True  True  False  test3   5      record100
101 app_1       False False False  test1   3      record101
101 app_2       True  False False  test2   4      record101

After applying groupby with Id along with concat of application and max of of columns starting with is_ and max of subid, the resultant dataframe is like below:

Id  application       is_a  is_b  is_c    subid
100 app_1,app2,app_3  True  True  False   5
101 app_1,app2        True  False False   4

I'm looking for non-aggregated columns reason and record matching with subid like below

Id  application       is_a  is_b  is_c    subid  reason record
100 app_1,app2,app_3  True  True  False   5      test3  record100
101 app_1,app2        True  False False   4      test2  record101

How can I do that with Pandas?

like image 928
N9909 Avatar asked Sep 08 '25 15:09

N9909


1 Answers

You should first try to preserve the non aggregated fields reason and record :

You compute max of subid per Id group then extract reason and record from rows that match max subid

max_subid = df.groupby('Id')['subid'].transform('max') == df['subid']
df_max = df[max_subid][['Id', 'reason', 'record', 'subid']]
print(df_max)
Id  reason  record    subid
100 test3   record100 5
101 test2   record101 4

you just perform the same thing you did before to obtain:

is_cols = [col for col in df.columns if col.startswith('is_')]
df_grouped = (
    df.groupby('Id')
      .agg({
          'application': lambda x: ','.join(x),
          **{col: 'max' for col in is_cols},
          'subid': 'max'
      })
      .reset_index()
)
print(df_grouped)
Id  application       is_b is_c  is_a  subid
100 app_1,app_2,app_3 True True  False 5
101 app_1,app_2       True False True  4    

After that merge results (match Id and subid) it will give you the result you want

df_final = pd.merge(df_grouped, df_max, on=['Id', 'subid'], how='left')
print(df_final)
Id         application  is_b   is_c   is_a  subid reason     record
100  app_1,app_2,app_3  True   True  False      5  test3  record100
101        app_1,app_2  True  False   True      4  test2  record101
like image 192
Aren Avatar answered Sep 10 '25 06:09

Aren