I have such a data frame(df):
group col1 col2
x a 22
x a 23
x b 16
x b 18
y a 11
y a 12
y a 16
y a 45
y b 24
Desired output is:
group col1 col2 rank
x a 22 1
x a 23 2
x b 16 0
x b 18 0
y a 11 1
y a 12 2
y a 16 3
y a 45 4
y b 24 0
Namely,
How can I do that by using R? I will be very glad for any help. Thanks a lot.
The ranking of a variable in an R data frame can be done by using rank function. For example, if we have a data frame df that contains column x then rank of values in x can be found as rank(df$x).
To get the top values in an R data frame, we can use the head function and if we want the values in decreasing order then sort function will be required. Therefore, we need to use the combination of head and sort function to find the top values in decreasing order.
rank in R based on multiple columns Meanwhile, the easiest way to create rank based on multiple columns is by using the frank function from data. table package. The trickiest part is to set descending order. If it is necessary to get descending order in the frank function, add a minus sign to the corresponding column.
rank() function in R Language is used to return the sample ranks of the values of a vector. Equal values and missing values are handled in multiple ways. Syntax: rank(x, na.last) Parameters: x: numeric, complex, character, and logical vector.
You could try
library(dplyr)
df %>%
group_by(group, col1) %>%
mutate(rank=replace(min_rank(col2), col1=='b',0) )
# group col1 col2 rank
#1 x a 22 1
#2 x a 23 2
#3 x b 16 0
#4 x b 18 0
#5 y a 11 1
#6 y a 12 2
#7 y a 16 3
#8 y a 45 4
#9 y b 24 0
If you don't want gaps between ranks when there are ties, replace min_rank
with dense_rank
Or, instead of replace
res <- df %>%
group_by(group, col1) %>%
mutate(rank=(col1!='b')*min_rank(col2))
as.data.frame(res) #would be `data.frame`
# group col1 col2 rank
#1 x a 22 1
#2 x a 23 2
#3 x b 16 0
#4 x b 18 0
#5 y a 11 1
#6 y a 12 2
#7 y a 16 3
#8 y a 45 4
#9 y b 24 0
Or using data.table
v>= 1.9.5
library(data.table)
setDT(df)[, rank := frank(col2, ties.method = "dense"),
by = .(group, col1)][col1 == "b", rank := 0L][]
# group col1 col2 rank
# 1: x a 22 1
# 2: x a 23 2
# 3: x b 16 0
# 4: x b 18 0
# 5: y a 11 1
# 6: y a 12 2
# 7: y a 16 3
# 8: y a 45 4
# 9: y b 24 0
Or like @Arun suggested, you can skip one grouping step if you will set b
to zero first
dt[, rank := 0L][col1 != "b", rank := frank(col2, ties.method="dense"), by=group][]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With