I want to calculate the quantiles of each row of a data frame and return the result as a matrix. Since I want to calculate and arbitrary number of quantiles (and I imagine that it is faster to calculate them all at once, rather than re-running the function), I tried using a formula I found in this question:
library(dplyr)
df<- as.data.frame(matrix(rbinom(1000,10,0.5),nrow = 2))
interim_res <- df %>%
rowwise() %>%
do(out = sapply(min(df):max(df), function(i) sum(i==.)))
interim_res <- interim_res[[1]] %>% do.call(rbind,.) %>% as.data.frame(.)
This makes sense, but when I try to apply the same framework to the quantile()
function, as coded here,
interim_res <- df %>%
rowwise() %>%
do(out = quantile(.,probs = c(0.1,0.5,0.9)))
interim_res <- interim_res[[1]] %>% do.call(rbind,.) %>% as.data.frame(.)
I get this error message:
Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) :
'x' must be atomic
Why am I getting an error with quantile
and not sum
? How should I fix this issue?
.
in do
is a data frame, which is why you get the error. This works:
df %>%
rowwise() %>%
do(data.frame(as.list(quantile(unlist(.),probs = c(0.1,0.5,0.9)))))
but risks being horrendously slow. Why not just:
apply(df, 1, quantile, probs = c(0.1,0.5,0.9))
Here are some timings with larger data:
df <- as.data.frame(matrix(rbinom(100000,10,0.5),nrow = 1000))
library(microbenchmark)
microbenchmark(
df %>% rowwise() %>% do(data.frame(as.list(quantile(unlist(.),probs = c(0.1,0.5,0.9))))),
apply(df, 1, quantile, probs = c(0.1,0.5,0.9)),
times=5
)
Produces:
min lq mean median uq max neval
dplyr 2375.2319 2376.6658 2446.4070 2419.4561 2454.6017 2606.0794 5
apply 224.7869 231.7193 246.7137 233.4757 245.0718 298.5144 5
If you go the apply
route you should probably stick with a matrix from the get go.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With