Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: groupby and aggregate without losing the column which was grouped

Tags:

I have a pandas dataframe as below. For each Id I can have multiple Names and Sub-ids.

Id      NAME   SUB_ID 276956  A      5933 276956  B      5934 276956  C      5935 287266  D      1589 

I want to condense the dataframe such that there is only one row for each id and all the names and sub_ids under each id appear as a singular set on that row

Id      NAME           SUB_ID 276956  set(A,B,C)     set(5933,5934,5935) 287266  set(D)         set(1589)  

I tried to groupby id and then aggregate over all the other columns

df.groupby('Id').agg(lambda x: set(x)) 

But in doing so the resulting dataframe does not have the Id column. When you do groupby the id is returned as the first value of the tuple but I guess when you aggregate that is lost. Is there a way to get the dataframe that I am looking for. That is to groupby and aggregate without losing the column which was grouped.

like image 838
Fizi Avatar asked Sep 11 '16 23:09

Fizi


People also ask

Does Groupby maintain order pandas?

Groupby preserves the order of rows within each group. When calling apply, add group keys to index to identify pieces. Reduce the dimensionality of the return type if possible, otherwise return a consistent type.

Can I use group by without aggregate function pandas?

Can we use group by without aggregate function in pandas? Groupby() is a function used to split the data in dataframe into groups based on a given condition. ... Instead of using groupby aggregation together, we can perform groupby without aggregation which is applicable to aggregate data separately.

How do you use Groupby and aggregate?

The GROUP BY statement groups rows that have the same values into summary rows, like "find the number of customers in each country". The GROUP BY statement is often used with aggregate functions ( COUNT() , MAX() , MIN() , SUM() , AVG() ) to group the result-set by one or more columns.

What does Group_by do in pandas?

What is the GroupBy function? Pandas' GroupBy is a powerful and versatile function in Python. It allows you to split your data into separate groups to perform computations for better analysis.


2 Answers

If you don't want the groupby as an index, there is an argument for it to avoid further reset:

df.groupby('Id', as_index=False).agg(lambda x: set(x)) 
like image 99
Zeugma Avatar answered Sep 23 '22 02:09

Zeugma


The groupby column becomes the index. You can simply reset the index to get it back:

In [4]: df.groupby('Id').agg(lambda x: set(x)).reset_index() Out[4]:         Id       NAME              SUB_ID 0  276956  {A, C, B}  {5933, 5934, 5935} 1  287266        {D}              {1589} 
like image 39
chrisaycock Avatar answered Sep 22 '22 02:09

chrisaycock