Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas groupby and join lists

Tags:

python

pandas

I have a dataframe df, with two columns, I want to groupby one column and join the lists belongs to same group, example:

column_a, column_b 1,         [1,2,3] 1,         [2,5] 2,         [5,6] 

after the process:

column_a, column_b 1,         [1,2,3,2,5] 2,         [5,6] 

I want to keep all the duplicates. I have the following questions:

  • The dtypes of the dataframe are object(s). convert_objects() doesn't convert column_b to list automatically. How can I do this?
  • what does the function in df.groupby(...).apply(lambda x: ...) apply to ? what is the form of x ? list?
  • the solution to my main problem?

Thanks in advance.

like image 719
fast tooth Avatar asked May 21 '14 21:05

fast tooth


People also ask

How do I turn a Groupby into a list?

You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.

How do I concatenate strings in pandas Groupby?

To concatenate strings from several rows using Python Pandas groupby, we can use the transform method. to create the text column that calls groupby on the selected columns name and month . And then we get the text column from the grouped data frame and call transform with a lamnda function to join the strings together.

How do I group by in pandas?

The Hello, World! of pandas GroupBy You call . groupby() and pass the name of the column that you want to group on, which is "state" . Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation.

How do I concatenate rows in pandas?

Use pandas. concat() to concatenate/merge two or multiple pandas DataFrames across rows or columns. When you concat() two pandas DataFrames on rows, it creates a new Dataframe containing all rows of two DataFrames basically it does append one DataFrame with another.


2 Answers

object dtype is a catch-all dtype that basically means not int, float, bool, datetime, or timedelta. So it is storing them as a list. convert_objects tries to convert a column to one of those dtypes.

You want

In [63]: df Out[63]:     a          b    c 0  1  [1, 2, 3]  foo 1  1     [2, 5]  bar 2  2     [5, 6]  baz   In [64]: df.groupby('a').agg({'b': 'sum', 'c': lambda x: ' '.join(x)}) Out[64]:           c                b a                           1  foo bar  [1, 2, 3, 2, 5] 2      baz           [5, 6] 

This groups the data frame by the values in column a. Read more about groupby.

This is doing a regular list sum (concatenation) just like [1, 2, 3] + [2, 5] with the result [1, 2, 3, 2, 5]

like image 191
TomAugspurger Avatar answered Sep 22 '22 04:09

TomAugspurger


df.groupby('column_a').agg(sum) 

This works because of operator overloading sum concatenates the lists together. The index of the resulting df will be the values from column_a:

like image 26
qwwqwwq Avatar answered Sep 23 '22 04:09

qwwqwwq