Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas: Add column to grouped DataFrame with method chaining

First let say that i'm new to pandas .

I am trying to make a new column in a DataFrame. I am able to do this as shown in my example. But I want to do this by chaining methods, so I don't have to assign new variables. Let me first show what I want to achieve, and what I have done this so far:

In [1]:
import numpy as np
from pandas import Series,DataFrame
import pandas as pd

In [2]:
np.random.seed(10)
df=pd.DataFrame(np.random.randint(1,5,size=(10, 3)), columns=list('ABC'))
df

Out [2]:
A  B  C
2  2  1
4  1  2
4  1  2
2  1  2
2  3  1
2  1  3
1  3  1
4  1  1
4  4  3
1  4  3
In [3]:
filtered_DF = df[df['B']<2].copy()
grouped_DF = filtered_DF.groupby('A')
filtered_DF['C_Share_By_Group'] =filtered_DF.C.div(grouped_DF.C.transform("sum"))
filtered_DF

Out [3]:
A  B  C  C_Share_By_Group
4  1  2               0.4
4  1  2               0.4
2  1  2               0.4
2  1  3               0.6
4  1  1               0.2

I want to achieve the same thing by chaining methods. In R with dplyr package, I would be able to do something like:

df %>% 
  filter(B<2) %>%
  group_by(A) %>% 
  mutate('C_Share_By_Group'=C/sum(C))

In the pandas documentation it says that mutate in R(dplyr) is equal to assign in pandas, but assign doesn't work on a grouped object. When I try to assign something to grouped dataframe, I get an error:

"AttributeError: Cannot access callable attribute 'assign' of 'DataFrameGroupBy' objects, try using the 'apply' method"

I have tried the following, but don't know how to add the new column, or if it is even possible to achieve this by chaining methods:

(df.loc[df.B<2]
   .groupby('A')
    #****WHAT GOES HERE?**** apply(something)?
)
like image 613
LauH Avatar asked May 10 '16 14:05

LauH


People also ask

How to append a column to a Dataframe in pandas?

Append a column to a DataFrame in Pandas 1 Append method is used to rows of other dataframe to existing dataframe. 2 Using dataframe.append () method in Python we can append the rows of other dataframe to an exisitng one. 3 If there is any extra column then new column is created with that name. More items...

How pandas can be used for Method chaining?

Pandas provide several functions for method chaining, Adding a new column to the data frame assign, renaming a column rename, filtering a data frame query etc. Let’s look at the pedagogical wine data set. It contains chemical composition for 178 wines.

How to find missing values from a column in pandas Dataframe?

x_df ['Age'].isna () selects the Age column and detects the missing values. Then, x_df.loc [cond, 'Pclass'] is used to access Pclass values conditionally and call Pandas map () for substituting each value with another value.

What are the advantages of method chaining in Python?

One obvious advantage of Method chaining is that it is a top-down approach with arguments placed next to the function unlike the nested calls, where tracking down respective function calls to its arguments is demanding. Adding a new column to the data frame assign, renaming a column rename, filtering a data frame query etc.


1 Answers

You can try assign:

print df[df['B']<2].assign(C_Share_By_Group=lambda df: 
                       df.C
                         .div(df.groupby('A')
                           .C
                           .transform("sum")))

   A  B  C  C_Share_By_Group
1  4  1  2               0.4
2  4  1  2               0.4
3  2  1  2               0.4
5  2  1  3               0.6
7  4  1  1               0.2
like image 180
jezrael Avatar answered Oct 16 '22 18:10

jezrael