Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concatenate rows of pandas DataFrame with same id

Say I have a pandas DataFrame such as:

   A  B  id
0  1  1   0
1  2  1   0
2  3  2   1
3  0  2   1

Say I want to combine rows with the same id so that the other elements in the rows get put together in a list, so that the above dataframe would become:

     A       B     id
0  [1, 2]  [1, 1]   0
1  [3, 0]  [2, 2]   1

as the first two rows, and the last two rows have the same id. Does pandas have a function to do this? I am aware of the pandas groupby command, but I would like the return type to be a dataframe as well. Thanks.

like image 632
Alex Avatar asked Jan 13 '16 20:01

Alex


People also ask

What is difference between pandas concat and merge?

Concat function concatenates dataframes along rows or columns. We can think of it as stacking up multiple dataframes. Merge combines dataframes based on values in shared columns. Merge function offers more flexibility compared to concat function because it allows combinations based on a condition.

Is PD concat faster than append?

Append function will add rows of second data frame to first dataframe iteratively one by one. Concat function will do a single operation to finish the job, which makes it faster than append().

How do I concatenate values in pandas?

By use + operator simply you can concatenate two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.


1 Answers

You could use groupby for that with groupby agg method and tolist method of Pandas Series:

In [762]: df.groupby('id').agg(lambda x: x.tolist())
Out[762]: 
         A       B
id                
0   [1, 2]  [1, 1]
1   [3, 0]  [2, 2]

groupby return an Dataframe as you want:

In [763]: df1 = df.groupby('id').agg(lambda x: x.tolist())

In [764]: type(df1)
Out[764]: pandas.core.frame.DataFrame

To exactly match your expected result you could additionally do reset_index or use as_index=False in groupby:

In [768]: df.groupby('id', as_index=False).agg(lambda x: x.tolist())
Out[768]: 
   id       A       B
0   0  [1, 2]  [1, 1]
1   1  [3, 0]  [2, 2]

In [771]: df1.reset_index()
Out[771]: 
   id       A       B
0   0  [1, 2]  [1, 1]
1   1  [3, 0]  [2, 2]
like image 114
Anton Protopopov Avatar answered Oct 18 '22 02:10

Anton Protopopov