Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do a pandas groupby operation on one column but keep the other in the resulting dataframe

Tags:

My question is about groupby operation with pandas. I have the following DataFrame :

In [4]: df = pd.DataFrame({"A": range(4), "B": ["PO", "PO", "PA", "PA"], "C": ["Est", "Est", "West", "West"]})  In [5]: df Out[5]:     A   B     C 0  0  PO   Est 1  1  PO   Est 2  2  PA  West 3  3  PA  West 

This is what I would like to do : I want to group by column B and do a sum on column A. But at the end, I would like column C to still be in the DataFrame. If I do :

In [8]: df.groupby(by="B").aggregate(pd.np.sum) Out[8]:      A B     PA  5 PO  1 

It does the job but column C is missing. I can also do this :

In [9]: df.groupby(by=["B", "C"]).aggregate(pd.np.sum) Out[9]:           A B  C       PA West  5 PO Est   1 

or

In [11]: df.groupby(by=["B", "C"], as_index=False).aggregate(pd.np.sum) Out[11]:      B     C  A 0  PA  West  5 1  PO   Est  1 

But in both cases it group by B AND C and not just B and keeps the C value. Is what I want to do irrelevant or is there a way to do it ?

like image 220
Ger Avatar asked Nov 03 '16 08:11

Ger


People also ask

Can you use Groupby with multiple columns in pandas?

groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.

How do you split a Groupby in pandas?

Step 1: split the data into groups by creating a groupby object from the original DataFrame; Step 2: apply a function, in this case, an aggregation function that computes a summary statistic (you can also transform or filter your data in this step); Step 3: combine the results into a new DataFrame.

How do I get every column except one pandas?

To select all columns except one column in Pandas DataFrame, we can use df. loc[:, df. columns != <column name>].

What are the three phases of the pandas Groupby () function?

The “group by” process: split-apply-combine (1) Splitting the data into groups. (2). Applying a function to each group independently, (3) Combining the results into a data structure.


1 Answers

try to use DataFrameGroupBy.agg() method with dict of {column -> function}:

In [6]: df.groupby('B').agg({'A':'sum', 'C':'first'}) Out[6]:        C  A B PA  West  5 PO   Est  1 

From docs:

Function to use for aggregating groups. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply. If passed a dict, the keys must be DataFrame column names.

or something like this depending on your goals:

In [8]: df = pd.DataFrame({"A": range(4), "B": ["PO", "PO", "PA", "PA"], "C": ["Est1", "Est2", "West1", "West2"]})  In [9]: df.groupby('B').agg({'A':'sum', 'C':'first'}) Out[9]:         C  A B PA  West1  5 PO   Est1  1  In [10]: df['sum_A'] = df.groupby('B')['A'].transform('sum')  In [11]: df Out[11]:    A   B      C  sum_A 0  0  PO   Est1      1 1  1  PO   Est2      1 2  2  PA  West1      5 3  3  PA  West2      5 
like image 179
MaxU - stop WAR against UA Avatar answered Sep 21 '22 06:09

MaxU - stop WAR against UA