Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert string into aggregated quantiles?

I have a dataframe that is nested by groups. I want to convert variable 'x' from its raw value to quantile position (20%, 40%, 60%, 80%, 100% or 1, 2, 3, 4, 5).

Here is an example of the data I'm using:

df <- data.frame(x=c(1, 5, 21, 24, 43, 47, 56, 59, 68, 69, 11, 15, 25, 27, 48, 49, 51, 55, 61, 67),
                 y=c("A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B"))

This is what I have tried:

df$z <- aggregate(df$x, by = list(df$y), FUN = function(x) quantile(x, probs = c(0.2, 0.4, 0.6, 0.8, 1), na.rm = T))

In essence, I would like to create a new variable that looks like this:

df$z <- c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5)
like image 495
Marco Pastor Mayo Avatar asked Mar 16 '26 03:03

Marco Pastor Mayo


2 Answers

On a grouped data.frame you can use dplyr::ntile():

library(dplyr)

df %>%
  group_by(y) %>%
  mutate(z = ntile(x, 5))

# A tibble: 20 x 3
# Groups:   y [2]
       x y         z
   <dbl> <fct> <int>
 1     1 A         1
 2     5 A         1
 3    21 A         2
 4    24 A         2
 5    43 A         3
 6    47 A         3
 7    56 A         4
 8    59 A         4
 9    68 A         5
10    69 A         5
11    11 B         1
12    15 B         1
13    25 B         2
14    27 B         2
15    48 B         3
16    49 B         3
17    51 B         4
18    55 B         4
19    61 B         5
20    67 B         5
like image 124
Ritchie Sacramento Avatar answered Mar 17 '26 16:03

Ritchie Sacramento


We can use cut with breaks as the quantile

library(dplyr)  
df %>%
   group_by(y) %>%
   mutate(z = as.integer(cut(x, breaks = c(-Inf, 
       quantile(x, probs = c(0.2, 0.4, 0.6, 0.8, 1), na.rm = TRUE)))))
# A tibble: 20 x 3
# Groups:   y [2]
#       x y         z
#   <dbl> <fct> <int>
# 1     1 A         1
# 2     5 A         1
# 3    21 A         2
# 4    24 A         2
# 5    43 A         3
# 6    47 A         3
# 7    56 A         4
# 8    59 A         4
# 9    68 A         5
#10    69 A         5
#11    11 B         1
#12    15 B         1
#13    25 B         2
#14    27 B         2
#15    48 B         3
#16    49 B         3
#17    51 B         4
#18    55 B         4
#19    61 B         5
#20    67 B         5

Or using base R with ave

with(df, ave(x, y, FUN = function(u) as.integer(cut(u, breaks = c(-Inf,
          quantile(u, probs = c(0.2, 0.4, 0.6, 0.8, 1), na.rm = TRUE))))))
#[1] 1 1 2 2 3 3 4 4 5 5 1 1 2 2 3 3 4 4 5 5

NOTE: Answering based on the quantile question OP asked

like image 27
akrun Avatar answered Mar 17 '26 17:03

akrun



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!