I have a pandas dataframe as below. For each Id I can have multiple Names and Sub-ids.
Id NAME SUB_ID 276956 A 5933 276956 B 5934 276956 C 5935 287266 D 1589
I want to condense the dataframe such that there is only one row for each id and all the names and sub_ids under each id appear as a singular set on that row
Id NAME SUB_ID 276956 set(A,B,C) set(5933,5934,5935) 287266 set(D) set(1589)
I tried to groupby id and then aggregate over all the other columns
df.groupby('Id').agg(lambda x: set(x))
But in doing so the resulting dataframe does not have the Id column. When you do groupby the id is returned as the first value of the tuple but I guess when you aggregate that is lost. Is there a way to get the dataframe that I am looking for. That is to groupby and aggregate without losing the column which was grouped.
Groupby preserves the order of rows within each group. When calling apply, add group keys to index to identify pieces. Reduce the dimensionality of the return type if possible, otherwise return a consistent type.
Can we use group by without aggregate function in pandas? Groupby() is a function used to split the data in dataframe into groups based on a given condition. ... Instead of using groupby aggregation together, we can perform groupby without aggregation which is applicable to aggregate data separately.
The GROUP BY statement groups rows that have the same values into summary rows, like "find the number of customers in each country". The GROUP BY statement is often used with aggregate functions ( COUNT() , MAX() , MIN() , SUM() , AVG() ) to group the result-set by one or more columns.
What is the GroupBy function? Pandas' GroupBy is a powerful and versatile function in Python. It allows you to split your data into separate groups to perform computations for better analysis.
If you don't want the groupby as an index, there is an argument for it to avoid further reset:
df.groupby('Id', as_index=False).agg(lambda x: set(x))
The groupby column becomes the index. You can simply reset the index to get it back:
In [4]: df.groupby('Id').agg(lambda x: set(x)).reset_index() Out[4]: Id NAME SUB_ID 0 276956 {A, C, B} {5933, 5934, 5935} 1 287266 {D} {1589}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With