Probably a duplicate, but I have spent too much time on this now googling without any luck. Assume I have a data frame:
import pandas as pd
data = {"letters": ["a", "a", "a", "b", "b", "b"],
"boolean": [True, True, True, True, True, False],
"numbers": [1, 2, 3, 1, 2, 3]}
df = pd.DataFrame(data)
df
I want to 1) group by letters, 2) take the mean of numbers if all values in boolean have the same value. In R I would write:
library(dplyr)
df %>%
group_by(letters) %>%
mutate(
condition = n_distinct(boolean) == 1,
numbers = ifelse(condition, mean(numbers), numbers)
) %>%
select(-condition)
This would result in the following output:
# A tibble: 6 x 3
# Groups: letters [2]
letters boolean numbers
<chr> <lgl> <dbl>
1 a TRUE 2
2 a TRUE 2
3 a TRUE 2
4 b TRUE 1
5 b TRUE 2
6 b FALSE 3
How would you do it using Python pandas?
Learn More. Heey great post, but pandas has very similar functions as dplyr. If you use those instead, you get statements very similar to your dplyr statements and you would get the same readability.
Dplython. Package dplython is dplyr for Python users. It provide infinite functionality for data preprocessing.
Groupby is a very popular function in Pandas. This is very good at summarising, transforming, filtering, and a few other very essential data analysis tasks.
Python and R are the two key players in the data science ecosystem. Both of these programming languages offer a rich selection of highly useful libraries. When it comes to data analysis and manipulation, two libraries stand out: “data.table” for R and Pandas for Python. I have been using both but I cannot really declare one superior to the other.
According to this thread on pandas github we can use the transform () method to replicate the combination of dplyr::groupby () and dplyr::mutate (). For this example, it would look as follows:
An expression using a data.frame called df in R with the columns a and b would be evaluated using with like so: In pandas the equivalent expression, using the eval () method, would be: In certain cases eval () will be much faster than evaluation in pure Python. For more details and examples see the eval documentation.
The functions revolve around three data structures in R, a for arrays, l for lists, and d for data.frame. The table below shows how these data structures could be mapped in Python. An expression using a data.frame called df in R where you want to summarize x by month: In pandas the equivalent expression, using the groupby () method, would be:
We can use lazy groupby
and transform
:
g = df.groupby('letters')
df.loc[g['boolean'].transform('all'), 'numbers'] = g['numbers'].transform('mean')
Output:
letters boolean numbers
0 a True 2
1 a True 2
2 a True 2
3 b True 1
4 b True 2
5 b False 3
Another way would be to use np.where. where a group has one unique value, find mean. Where it doesnt keep the numbers. Code below
df['numbers'] =np.where(df.groupby('letters')['boolean'].transform('nunique')==1,df.groupby('letters')['numbers'].transform('mean'), df['numbers'])
letters boolean numbers
0 a True 2.0
1 a True 2.0
2 a True 2.0
3 b True 1.0
4 b True 2.0
5 b False 3.0
Alternatively, mask where condition does not apply as you compute the mean.
m=df.groupby('letters')['boolean'].transform('nunique')==1
df.loc[m, 'numbers']=df[m].groupby('letters')['numbers'].transform('mean')
Since you are comparing drectly to R, I would prefer to use siuba
rather than pandas
:
from siuba import mutate, if_else, _, select, group_by, ungroup
df1 = df >>\
group_by(_.letters) >> \
mutate( condition = _.boolean.unique().size == 1,
numbers = if_else(_.condition, _.numbers.mean(), _.numbers)
) >>\
ungroup() >> select(-_.condition)
print(df1)
letters boolean numbers
0 a True 2.0
1 a True 2.0
2 a True 2.0
3 b True 1.0
4 b True 2.0
5 b False 3.0
Note that >>
is the pipe. I added \
in order to jump to the next line. Also note that to refer to the variables you use _.variable
It seems your R code has an issue, In R, you should rather use condition = all(boolean)
instead of the code you have. That will translate to condition = boolean.all()
in Python
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With