I have a data.table with over ten thousand of rows and it looks like this:
DT1 <- data.table(ID = 1:10,
result_2010 = c("TRUE", "FALSE", "TRUE", "FALSE", "FALSE", "TRUE", "FALSE", "FALSE", "TRUE", "FALSE"),
result_2011 = c("FALSE", "TRUE", "FALSE", "FALSE", "FALSE", "FALSE", "TRUE", "FALSE", "FALSE", "TRUE"),
years = c(15, 16.5, 31, 1, 40.2, 0.3, 12, 22.7, 19, 12))
ID result_2010 result_2011 years
1: 1 TRUE FALSE 15.0
2: 2 FALSE TRUE 16.5
3: 3 TRUE FALSE 31.0
4: 4 FALSE FALSE 1.0
5: 5 FALSE FALSE 40.2
6: 6 TRUE FALSE 0.3
7: 7 FALSE TRUE 12.0
8: 8 FALSE FALSE 22.7
9: 9 TRUE FALSE 19.0
10: 10 FALSE TRUE 12.0
For "result_2010" and "result_2011" I want to make a percentile analysis of the "years" but only if the value in for the individual is "TRUE". The code that I tried seems to work, but it gives back the same results for "result_2010" and "result_2011", which is for sure incorrect:
DT1 %>%
group_by(result_2010 == "TRUE") %>%
summarise("10.quantile"= round(quantile(years,c(.10)),digits=1),
"25.quantile"= round(quantile(years,c(.25)),digits=1),
"Median"= round(quantile(years,c(.50)),digits=1),
"75.quantile"= round(quantile(years,c(.75)),digits=1),
"90.quantile"= round(quantile(years,c(.90)),digits=1),
"Mean" = round(mean(years),digits=1))
DT1 %>%
group_by(result_2011 == "TRUE") %>%
summarise("10.quantile"= round(quantile(years,c(.10)),digits=1),
"25.quantile"= round(quantile(years,c(.25)),digits=1),
"Median"= round(quantile(years,c(.50)),digits=1),
"75.quantile"= round(quantile(years,c(.75)),digits=1),
"90.quantile"= round(quantile(years,c(.90)),digits=1),
"Mean" = round(mean(years),digits=1))
Could anyone help how to correct my code?
Using melt
and aggregate
.
library(data.table)
melt(DT1, c(1, 4), 2:3) |>
transform(variable=substring(variable, 8)) |>
subset(value == TRUE) |>
with(aggregate(list(q=years), list(year=variable), \(x)
c(quantile(x), mean=mean(x))))
# year q.0% q.25% q.50% q.75% q.100% q.mean
# 1 2010 0.300 11.325 17.000 22.000 31.000 16.325
# 2 2011 12.000 12.000 12.000 14.250 16.500 13.500
Note: Please use R>=4.1 for the |>
pipes and \(x)
function shorthand notation (or write function(x)
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With