Substitute for mutate (dplyr package) in python pandas

Tags:

Is there a Python pandas function similar to R's dplyr::mutate(), which can add a new column to grouped data by applying a function on one of the columns of the grouped data? Below is the detailed explanation of the problem:

I generated sample data using this code:

x <- data.frame(country = rep(c("US", "UK"), 5), state = c(letters[1:10]), pop=sample(10000:50000,10))

Now, I want to add a new column which has maximum population for US and UK. I can do it using following R code...

x <- group_by(x, country)
x <- mutate(x,max_pop = max(pop))
x <- arrange(x, country)

...or equivalently, using the R dplyr pipe operator:

x %>% group_by(country) %>% mutate(max_pop = max(pop)) %>% arrange(country)

So my question is how do I do it in Python using pandas? I tried following but it did not work

x['max_pop'] = x.groupby('country').pop.apply(max)

229

asked Dec 14 '16 16:12

saurav shekhar

1 Answers

you want to use transform. transform will return an object with the same index as what's being grouped which makes it easy to assign back as a new column in that object if it's a dataframe.

x['max_pop'] = x.groupby('country').pop.transform('max')

Setup

import pandas as pd 

x = pd.DataFrame(dict(
    country=['US','UK','US','UK'],
    state=['a','b','c','d'],
    pop=[37088, 46987, 17116, 20484]
))

answered Oct 14 '22 23:10

piRSquared

Related questions
                            
                                Make dice values NOT repeat in if statement
                            
                                Django inheritance and parent object related name
                            
                                Keras ImageDataGenerator setting mean and std
                            
                                Add columns in pandas dataframe dynamically
                            
                                How to get SNS published message
                            
                                How to sum the nlargest() integers in groupby [duplicate]
                            
                                Django migrations. How to check if table exists in migrations?
                            
                                Python ElementTree "Invalid descendant" error
                            
                                Python Plotly Multiple Histogram with Mean Line
                            
                                How to sum the values of list to the power of their indices
                            
                                Detect if mouse has left Pygame window
                            
                                cumulative argmax of a numpy array
                            
                                Base64 Incorrect padding error using Python
                            
                                Python datetime and tzinfo objects (changing minutes instead of hours)
                            
                                Python equivalent for Matlab's Demcmap (elevation +/- appropriate colormap)
                            
                                Fit 3D Polynomial Surface with Python
                            
                                PyCharm Django project fails to run with debugging
                            
                                adding parameter to python callback
                            
                                How to get names of all the variables defined in methods of a class
                            
                                Django: IPv4 only for GenericIPAddressField

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Substitute for mutate (dplyr package) in python pandas

Tags:

python

pandas

r

dplyr

saurav shekhar

People also ask

1 Answers

piRSquared

Recent Activity

Donate For Us