New column in pandas - adding series to dataframe by applying a list groupby

Tags:

Give the following df

  Id other  concat
0  A     z       1
1  A     y       2
2  B     x       3
3  B     w       4
4  B     v       5
5  B     u       6

I want the result with new column with grouped values as list

  Id other  concat           new
0  A     z       1        [1, 2]
1  A     y       2        [1, 2]
2  B     x       3  [3, 4, 5, 6]
3  B     w       4  [3, 4, 5, 6]
4  B     v       5  [3, 4, 5, 6]
5  B     u       6  [3, 4, 5, 6]

This is similar to these questions:

grouping rows in list in pandas groupby

Replicating GROUP_CONCAT for pandas.DataFrame

However, it is apply the grouping you get from df.groupby('Id')['concat'].apply(list), which is a Series of smaller size than the dataframe, to the original dataframe.

I have tried the code below, but it does not apply this to the dataframe:

import pandas as pd
df = pd.DataFrame( {'Id':['A','A','B','B','B','C'], 'other':['z','y','x','w','v','u'], 'concat':[1,2,5,5,4,6]})
df.groupby('Id')['concat'].apply(list)

I know that transform can be used to apply groupings to dataframes, but it does not work in this case.

>>> df['new_col'] = df.groupby('Id')['concat'].transform(list)
>>> df
  Id  concat other  new_col
0  A       1     z        1
1  A       2     y        2
2  B       5     x        5
3  B       5     w        5
4  B       4     v        4
5  C       6     u        6
>>> df['new_col'] = df.groupby('Id')['concat'].apply(list)
>>> df
  Id  concat other new_col
0  A       1     z     NaN
1  A       2     y     NaN
2  B       5     x     NaN
3  B       5     w     NaN
4  B       4     v     NaN
5  C       6     u     NaN

248

asked Nov 04 '16 22:11

chase

2 Answers

groupby with join

df.join(df.groupby('Id').concat.apply(list).to_frame('new'), on='Id')

enter image description here

114

answered Oct 11 '22 14:10

piRSquared

Less elegant (and slower..) solution, but let it be here just as an alternative.

def func(gr):
    gr['new'] = [list(gr.concat)] * len(gr.index)
    return gr
df.groupby('Id').apply(func)

%timeit df.groupby('Id').apply(func)
100 loops, best of 3: 4.18 ms per loop

%timeit df.join(df.groupby('Id').concat.apply(list).to_frame('new'), on='Id')
1000 loops, best of 3: 1.69 ms per loop

answered Oct 11 '22 14:10

Dennis Golomazov

Related questions
                            
                                stack all levels of a MultiIndex
                            
                                How to reindex a pandas DataFrame after concatenation
                            
                                Is there a pythonic way to process tree-structured dict keys?
                            
                                Pandas: Delete rows based on multiple columns values
                            
                                How can i find all ydl_opts
                            
                                What is the difference between Property Based Testing and Mutation testing?
                            
                                Can't access dataframe columns
                            
                                Sklearn Fit model multiple times
                            
                                How to make a copy of xml tree in python using ElementTree?
                            
                                How get equation after fitting in scikit-learn?
                            
                                Relating column names to model parameters in pySpark ML
                            
                                Geopandas Dataframe Points to Polygons
                            
                                How to filter a nested dictionary (pythonic way) for a specific value using map or filter instead of list comprehensions?
                            
                                Replace rows in a Pandas df with rows from another df
                            
                                If there difference between `\A` vs `^` (caret) in regular expression?
                            
                                Convert float to int and leave nulls
                            
                                Returning multiple lists from pool.map processes?
                            
                                How to pack spheres in python?
                            
                                How to calculate the Kolmogorov-Smirnov statistic between two weighted samples
                            
                                Is there a way to use itertools in python to clean up nested iterations? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

New column in pandas - adding series to dataframe by applying a list groupby

Tags:

python

pandas

dataframe

group-concat

pandas-groupby

chase

People also ask

2 Answers

piRSquared

Dennis Golomazov

Recent Activity

Donate For Us