I have a df of strata, each of which has 1000 samples from a posterior distribution of the estimates from that stratum.
mydf <- as.data.frame(lapply(seq(1, 1000), rnorm, n=100))
colnames(mydf) <- paste('s', seq(1, ncol(mydf)), sep='')
I want to add columns for a few quantiles of the distribution for each row. In classic R, I'd write this.
quants <- t(apply(mydf, 1, quantile, probs=c(.025, .5, .975)))
colnames(quants) <- c('s_lo', 's_med', 's_hi')
mydf <- cbind(mydf, quants)
I suspect there's a direct way to do this in dplyr
(maybe rowwise
?) but my attempts have failed. Ideas?
dplyr
is not optimized for row-based calculations like that. Though you can do this with rowwise()
, I recommend against it: performance will be abysmal. Your best speed will likely be with something that expects a matrix
, and can operate on the rows. I suggest apply
.
Instead of dealing with a 100x1000 data.frame
, for brevity I'll go with 5 columns:
set.seed(2)
mydf <- as.data.frame(lapply(seq(1, 5), rnorm, n=10))
colnames(mydf) <- paste('s', seq(1, ncol(mydf)), sep='')
Converting to a matrix
is only reasonable if all columns are of the same class
. In this case, they are all numeric
so we are safe. (If you have non-numeric columns in the dataframe, extract only the ones you need here and bind them back in later.)
mymtx <- as.matrix(mydf)
apply(mymtx, 1, quantile, c(0.1, 0.9))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# 10% 1.028912 1.430939 1.999521 0.305907 1.753824 0.03267599 1.934381 1.270504 2.995816 1.489634
# 90% 4.950067 3.807735 4.881554 6.123989 4.886388 5.55628806 4.207605 4.184460 4.406384 3.782134
One notable with using apply
like this is that the result is in row-based form, perhaps transposed from what one would expect. Simply wrap it in t(...)
and you'll see the columns you might expect.
This can be recombined with the original dataframe using cbind
or similar function.
This can be done in a pipeline like so:
mydf %>%
bind_cols(as.data.frame(t(apply(., 1, quantile, c(0.1, 0.9)))))
# s1 s2 s3 s4 s5 10% 90%
# 1 0.1030855 2.4176508 5.0908192 4.738939 4.616414 1.02891157 4.950067
# 2 1.1848492 2.9817528 1.8000742 4.318960 3.040897 1.43093918 3.807735
# 3 2.5878453 1.6073046 4.5896382 5.076164 4.158295 1.99952092 4.881554
# 4 -0.1303757 0.9603310 4.9546516 3.715842 6.903547 0.30590700 6.123989
# 5 0.9197482 3.7822290 3.0049378 3.223325 5.622494 1.75382406 4.886388
# 6 1.1324203 -0.3110691 0.5482936 3.404340 6.990920 0.03267599 5.556288
# 7 1.7079547 2.8786046 3.4772373 2.274020 4.694516 1.93438093 4.207605
# 8 0.7603020 2.0358067 2.4034418 3.097416 4.909156 1.27050387 4.184460
# 9 2.9844739 3.0128287 3.7922033 3.440938 4.815839 2.99581584 4.406384
# 10 0.8612130 2.4322652 3.2896367 3.753487 3.801232 1.48963385 3.782134
I'll leave the column naming up to you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With