Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas equivalent to R groupby mutate

So in R when I have a data frame consisting of say 4 columns, call it df and I want to compute the ratio by sum product of a group, I can it in such a way:

// generate data df = data.frame(a=c(1,1,0,1,0),b=c(1,0,0,1,0),c=c(10,5,1,5,10),d=c(3,1,2,1,2)); | a   b   c    d | | 1   1   10   3 | | 1   0   5    1 | | 0   0   1    2 | | 1   1   5    1 | | 0   0   10   2 | // compute sum product ratio df = df%>% group_by(a,b) %>%       mutate(           ratio=c/sum(c*d)       ); | a   b   c    d  ratio | | 1   1   10   3  0.286 | | 1   1   5    1  0.143 | | 1   0   5    1  1     | | 0   0   1    2  0.045 | | 0   0   10   2  0.454 | 

But in python I need to resort to loops. I know there should be a more elegant way than raw loops in python, anyone got any ideas?

like image 436
asosnovsky Avatar asked Dec 02 '16 01:12

asosnovsky


People also ask

Is there a mutate function in Python?

mutate() allows you to create new columns in the DataFrame. The new columns can be composed from existing columns. For example, let's create two new columns: one by dividing the distance column by 1000 , and the other by concatenating the carrier and origin columns.

Is there a dplyr for Python?

Dplython. Package dplython is dplyr for Python users. It provide infinite functionality for data preprocessing.

Is pandas similar to dplyr?

Learn More. Heey great post, but pandas has very similar functions as dplyr. If you use those instead, you get statements very similar to your dplyr statements and you would get the same readability.

Is Tidyverse better than pandas?

Pandas definitely takes longer to get used to than Tidyverse but the payoff is that you get to use Python, which is a somewhat "deeper" language than R. R is great for interactive work, and for data munging jobs that don't interact too much with non-R libraries. However Python is sinply more versatile end-to-end.


1 Answers

It can be done with similar syntax with groupby() and apply():

df['ratio'] = df.groupby(['a','b'], group_keys=False).apply(lambda g: g.c/(g.c * g.d).sum()) 

enter image description here

like image 132
Psidom Avatar answered Sep 25 '22 08:09

Psidom