Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rank variable by group (dplyr)

Tags:

r

dplyr

I have a dataframe with columns x1, x2, group and I would like to generate a new dataframe with an extra column rank that indicates the order of x1 in its group.

There is a related question here, but the accepted answer does not seem to work anymore.

Until here, it's fine:

library(dplyr) data(iris) by_species <- iris %>%                arrange(Species, Sepal.Length) %>%                group_by(Species)   

But when I try to get the ranks by group:

by_species <- mutate(by_species, rank=row_number()) 

The error is:

Error in rank(x, ties.method = "first", na.last = "keep") :
argument "x" is missing, with no default

Update

The problem was some conflict between dplyr and plyr. To reproduce the error, load both packages:

library(dplyr) library(plyr) data(iris) by_species <- iris %>%                arrange(Species, Sepal.Length) %>%                group_by(Species) %>%                mutate(rank=row_number()) # Error in rank(x, ties.method = "first", na.last = "keep") :  # argument "x" is missing, with no default 

Unloading plyr it works as it should:

detach("package:plyr", unload=TRUE) by_species <- iris %>%                arrange(Species, Sepal.Length) %>%                group_by(Species) %>%                mutate(rank=row_number())  by_species %>% filter(rank <= 3)  ##   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species  rank ##          (dbl)       (dbl)        (dbl)       (dbl)     (fctr) (int) ## 1          4.3         3.0          1.1         0.1     setosa     1 ## 2          4.4         2.9          1.4         0.2     setosa     2 ## 3          4.4         3.0          1.3         0.2     setosa     3 ## 4          4.9         2.4          3.3         1.0 versicolor     1 ## 5          5.0         2.0          3.5         1.0 versicolor     2 ## 6          5.0         2.3          3.3         1.0 versicolor     3 ## 7          4.9         2.5          4.5         1.7  virginica     1 ## 8          5.6         2.8          4.9         2.0  virginica     2 ## 9          5.7         2.5          5.0         2.0  virginica     3 
like image 840
alberto Avatar asked Jan 23 '16 19:01

alberto


2 Answers

For future readers, the rank by group variable can be achieved using base R. Per the OP's iris data example to rank according to Sepal.Length:

# ORDER BY SPECIES AND SEPAL.LENGTH iris <- iris[with(iris, order(Species, Sepal.Length)), ]  # RUN A ROW COUNT FOR RANK BY SPECIES GROUP iris$rank <- sapply(1:nrow(iris),                      function(i) sum(iris[1:i, c('Species')]==iris$Species[i]))  # FILTER DATA FRAME BY TOP 3 iris <- iris[iris$rank <= 3,] 
like image 34
Parfait Avatar answered Oct 04 '22 16:10

Parfait


The following produces the desired result as was specified.

library(dplyr)  by_species <- iris %>% arrange(Species, Sepal.Length) %>%     group_by(Species) %>%      mutate(rank = rank(Sepal.Length, ties.method = "first"))  by_species %>% filter(rank <= 3) ##Source: local data frame [9 x 6] ##Groups: Species [3] ## ##  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species  rank ##         (dbl)       (dbl)        (dbl)       (dbl)     (fctr) (int) ##1          4.3         3.0          1.1         0.1     setosa     1 ##2          4.4         2.9          1.4         0.2     setosa     2 ##3          4.4         3.0          1.3         0.2     setosa     3 ##4          4.9         2.4          3.3         1.0 versicolor     1 ##5          5.0         2.0          3.5         1.0 versicolor     2 ##6          5.0         2.3          3.3         1.0 versicolor     3 ##7          4.9         2.5          4.5         1.7  virginica     1 ##8          5.6         2.8          4.9         2.0  virginica     2 ##9          5.7         2.5          5.0         2.0  virginica     3  by_species %>% slice(1:3) ##Source: local data frame [9 x 6] ##Groups: Species [3] ## ##  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species  rank ##         (dbl)       (dbl)        (dbl)       (dbl)     (fctr) (int) ##1          4.3         3.0          1.1         0.1     setosa     1 ##2          4.4         2.9          1.4         0.2     setosa     2 ##3          4.4         3.0          1.3         0.2     setosa     3 ##4          4.9         2.4          3.3         1.0 versicolor     1 ##5          5.0         2.0          3.5         1.0 versicolor     2 ##6          5.0         2.3          3.3         1.0 versicolor     3 ##7          4.9         2.5          4.5         1.7  virginica     1 ##8          5.6         2.8          4.9         2.0  virginica     2 ##9          5.7         2.5          5.0         2.0  virginica     3 
like image 113
steveb Avatar answered Oct 04 '22 14:10

steveb