First let say that i'm new to pandas .
I am trying to make a new column in a DataFrame. I am able to do this as shown in my example. But I want to do this by chaining methods, so I don't have to assign new variables. Let me first show what I want to achieve, and what I have done this so far:
In [1]:
import numpy as np
from pandas import Series,DataFrame
import pandas as pd
In [2]:
np.random.seed(10)
df=pd.DataFrame(np.random.randint(1,5,size=(10, 3)), columns=list('ABC'))
df
Out [2]:
A  B  C
2  2  1
4  1  2
4  1  2
2  1  2
2  3  1
2  1  3
1  3  1
4  1  1
4  4  3
1  4  3
In [3]:
filtered_DF = df[df['B']<2].copy()
grouped_DF = filtered_DF.groupby('A')
filtered_DF['C_Share_By_Group'] =filtered_DF.C.div(grouped_DF.C.transform("sum"))
filtered_DF
Out [3]:
A  B  C  C_Share_By_Group
4  1  2               0.4
4  1  2               0.4
2  1  2               0.4
2  1  3               0.6
4  1  1               0.2
I want to achieve the same thing by chaining methods. In R with dplyr package, I would be able to do something like:
df %>% 
  filter(B<2) %>%
  group_by(A) %>% 
  mutate('C_Share_By_Group'=C/sum(C))
In the pandas documentation it says that mutate in R(dplyr) is equal to assign in pandas, but assign doesn't work on a grouped object. 
 When I try to assign something to grouped dataframe, I get an error:
"AttributeError: Cannot access callable attribute 'assign' of 'DataFrameGroupBy' objects, try using the 'apply' method"
I have tried the following, but don't know how to add the new column, or if it is even possible to achieve this by chaining methods:
(df.loc[df.B<2]
   .groupby('A')
    #****WHAT GOES HERE?**** apply(something)?
)
                Append a column to a DataFrame in Pandas 1 Append method is used to rows of other dataframe to existing dataframe. 2 Using dataframe.append () method in Python we can append the rows of other dataframe to an exisitng one. 3 If there is any extra column then new column is created with that name. More items...
Pandas provide several functions for method chaining, Adding a new column to the data frame assign, renaming a column rename, filtering a data frame query etc. Let’s look at the pedagogical wine data set. It contains chemical composition for 178 wines.
x_df ['Age'].isna () selects the Age column and detects the missing values. Then, x_df.loc [cond, 'Pclass'] is used to access Pclass values conditionally and call Pandas map () for substituting each value with another value.
One obvious advantage of Method chaining is that it is a top-down approach with arguments placed next to the function unlike the nested calls, where tracking down respective function calls to its arguments is demanding. Adding a new column to the data frame assign, renaming a column rename, filtering a data frame query etc.
You can try assign:
print df[df['B']<2].assign(C_Share_By_Group=lambda df: 
                       df.C
                         .div(df.groupby('A')
                           .C
                           .transform("sum")))
   A  B  C  C_Share_By_Group
1  4  1  2               0.4
2  4  1  2               0.4
3  2  1  2               0.4
5  2  1  3               0.6
7  4  1  1               0.2
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With