I have a pandas dataframe as below. For each Id I can have multiple Names and Sub-ids. <pre class="prettyprint"><code>Id NAME SUB_ID 276956 A 5933 276956 B 5934 276956 C 5935 287266 D 1589 </code></pre> I want to condense the dataframe such that there is only one row for each id and all the names and sub_ids under each id appear as a singular set on that row <pre class="prettyprint"><code>Id NAME SUB_ID 276956 set(A,B,C) set(5933,5934,5935) 287266 set(D) set(1589) </code></pre> I tried to groupby id and then aggregate over all the other columns <pre class="prettyprint"><code>df.groupby('Id').agg(lambda x: set(x)) </code></pre> But in doing so the resulting dataframe does not have the Id column. When you do groupby the id is returned as the first value of the tuple but I guess when you aggregate that is lost. Is there a way to get the dataframe that I am looking for. That is to groupby and aggregate without losing the column which was grouped.

If you don't want the groupby as an index, there is an argument for it to avoid further reset: <pre class="prettyprint"><code>df.groupby('Id', as_index=False).agg(lambda x: set(x)) </code></pre>

The groupby column becomes the index. You can simply reset the index to get it back: <pre class="prettyprint"><code>In [4]: df.groupby('Id').agg(lambda x: set(x)).reset_index() Out[4]: Id NAME SUB_ID 0 276956 {A, C, B} {5933, 5934, 5935} 1 287266 {D} {1589} </code></pre>

pandas: groupby and aggregate without losing the column which was grouped

Tags:

I have a pandas dataframe as below. For each Id I can have multiple Names and Sub-ids.

Id      NAME   SUB_ID 276956  A      5933 276956  B      5934 276956  C      5935 287266  D      1589

I want to condense the dataframe such that there is only one row for each id and all the names and sub_ids under each id appear as a singular set on that row

Id      NAME           SUB_ID 276956  set(A,B,C)     set(5933,5934,5935) 287266  set(D)         set(1589)

I tried to groupby id and then aggregate over all the other columns

df.groupby('Id').agg(lambda x: set(x))

But in doing so the resulting dataframe does not have the Id column. When you do groupby the id is returned as the first value of the tuple but I guess when you aggregate that is lost. Is there a way to get the dataframe that I am looking for. That is to groupby and aggregate without losing the column which was grouped.

838

asked Sep 11 '16 23:09

Fizi

2 Answers

If you don't want the groupby as an index, there is an argument for it to avoid further reset:

df.groupby('Id', as_index=False).agg(lambda x: set(x))

answered Sep 23 '22 02:09

Zeugma

The groupby column becomes the index. You can simply reset the index to get it back:

In [4]: df.groupby('Id').agg(lambda x: set(x)).reset_index() Out[4]:         Id       NAME              SUB_ID 0  276956  {A, C, B}  {5933, 5934, 5935} 1  287266        {D}              {1589}

answered Sep 22 '22 02:09

chrisaycock

Related questions
                            
                                What is routing? Why is "routing" needed in single page web apps?
                            
                                Swift 3: Can not convert value of type 'int' to expected argument type 'DispatchQueue.GlobalQueuePriority'
                            
                                Adding Swift 3 packages to Xcode 8 using the Swift package manager
                            
                                RxJS first() for Observable.of() - no elements in sequence
                            
                                Powershell - Test-Connection failed due to lack of resources
                            
                                How to enable 'Access-Control-Allow-Origin' header for all files in a directory of XAMPP?
                            
                                Remove noise from threshold image opencv python
                            
                                Keycloak, not returning access token if update password action selected
                            
                                What is the difference between `Host` and `URL.Host` for golang `http.Request`?
                            
                                how to draw communities with networkx
                            
                                Why disable dropout during validation and testing?
                            
                                Detect if JavaScript object is a FormData instance

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With