Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate percentile with group by?

I have a data.table with over ten thousand of rows and it looks like this:

DT1 <- data.table(ID = 1:10,
                  result_2010 = c("TRUE", "FALSE", "TRUE", "FALSE", "FALSE", "TRUE", "FALSE", "FALSE", "TRUE", "FALSE"),
                  result_2011 = c("FALSE", "TRUE", "FALSE", "FALSE", "FALSE", "FALSE", "TRUE", "FALSE", "FALSE", "TRUE"),
                  years = c(15, 16.5, 31, 1, 40.2, 0.3, 12, 22.7, 19, 12))

    ID result_2010 result_2011 years
 1:  1        TRUE       FALSE  15.0
 2:  2       FALSE        TRUE  16.5
 3:  3        TRUE       FALSE  31.0
 4:  4       FALSE       FALSE   1.0
 5:  5       FALSE       FALSE  40.2
 6:  6        TRUE       FALSE   0.3
 7:  7       FALSE        TRUE  12.0
 8:  8       FALSE       FALSE  22.7
 9:  9        TRUE       FALSE  19.0
10: 10       FALSE        TRUE  12.0

For "result_2010" and "result_2011" I want to make a percentile analysis of the "years" but only if the value in for the individual is "TRUE". The code that I tried seems to work, but it gives back the same results for "result_2010" and "result_2011", which is for sure incorrect:

DT1 %>%
  group_by(result_2010 == "TRUE") %>%
  summarise("10.quantile"= round(quantile(years,c(.10)),digits=1),
            "25.quantile"= round(quantile(years,c(.25)),digits=1),
            "Median"= round(quantile(years,c(.50)),digits=1),
            "75.quantile"= round(quantile(years,c(.75)),digits=1),
            "90.quantile"= round(quantile(years,c(.90)),digits=1),
            "Mean" = round(mean(years),digits=1))
DT1 %>%
  group_by(result_2011 == "TRUE") %>%
  summarise("10.quantile"= round(quantile(years,c(.10)),digits=1),
            "25.quantile"= round(quantile(years,c(.25)),digits=1),
            "Median"= round(quantile(years,c(.50)),digits=1),
            "75.quantile"= round(quantile(years,c(.75)),digits=1),
            "90.quantile"= round(quantile(years,c(.90)),digits=1),
            "Mean" = round(mean(years),digits=1))

Could anyone help how to correct my code?

like image 951
Besz15 Avatar asked Dec 18 '22 11:12

Besz15


1 Answers

Using melt and aggregate.

library(data.table)
melt(DT1, c(1, 4), 2:3) |>
  transform(variable=substring(variable, 8)) |>
  subset(value == TRUE) |>
  with(aggregate(list(q=years), list(year=variable), \(x)
                 c(quantile(x), mean=mean(x))))
#   year   q.0%  q.25%  q.50%  q.75% q.100% q.mean
# 1 2010  0.300 11.325 17.000 22.000 31.000 16.325
# 2 2011 12.000 12.000 12.000 14.250 16.500 13.500

Note: Please use R>=4.1 for the |> pipes and \(x) function shorthand notation (or write function(x)).

like image 194
jay.sf Avatar answered Feb 11 '23 16:02

jay.sf