Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

calculating percentiles for multiple columns in R

Tags:

r

I need to calculate the quantiles at following probability values 0.05,0.25,0.50,0.75,0.90,0.95,0.99,1 for 100 variables excluding time

Data structure is as below

datasetname-df

time Var1 var2 var3.....var100

 1    100   230  378......300

 2    200  145  129......240

 3    150  235  200 .... 690

I am using the below logic.

percentiles <- do.call("rbind",tapply(df[2:100],quantile,probs=c(0,0.05,0.25,0.50,0.75,0.90,0.95,0.99,1),na.rm=TRUE))

Since this runs only on vectors, it would be difficult to call all 100 variables.

like image 919
bnair Avatar asked Mar 10 '16 10:03

bnair


1 Answers

Why use tapply? Just using apply seems fine here, e.g.:

quants <- c(0,0.05,0.25,0.50,0.75,0.90,0.95,0.99,1)
apply( df[2:100] , 2 , quantile , probs = quants , na.rm = TRUE )
like image 64
David Heckmann Avatar answered Sep 24 '22 06:09

David Heckmann