Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to apply Quantile on a dataframe

I have a data.frame and I want to apply quantile on that to make data look simpler:

> head(Quartile)
             GSM1321374 GSM1321375 GSM1321376 GSM1321377 GSM1321378 GSM1321379
1415670_at    11.203302  11.374616  10.876187   11.23639   11.02051  10.926481
1415671_at    11.196427  11.492769  11.493717   11.01683   11.15016  11.576188
1415672_at    11.550974  11.267559  11.800991   11.57551   10.93359  11.222779
1415673_at    11.293390  10.978280  11.367316   10.45135   10.35822  10.234964
1415674_a_at   9.254073  10.572670   9.361991   11.26998   10.21125  10.245857
1415675_at     9.922985   9.228195   9.798156   10.02844   10.19928   9.749947

I applied following function and it did the job.

quantfun <- function(x) as.integer(cut(x, quantile(x, probs=0:4/4), include.lowest=TRUE))
a <- apply(Quartile,1,quantfun)
b <- t(a)
colnames(b) <- colnames(Quartile)

And the output is:

> head(b)
             GSM1321374 GSM1321375 GSM1321376 GSM1321377 GSM1321378 GSM1321379
1415670_at            3          4          1          4          2          1
1415671_at            2          3          4          1          1          4
1415672_at            3          2          4          4          1          1
1415673_at            4          3          4          2          1          1
1415674_a_at          1          4          1          4          2          3
1415675_at            3          1          2          4          4          1

But the problem is it applies quantile on each column separately and I want one uniform quantile for whole data.frame.

> duration = Quartile$GSM1321374
> quantile(duration)
       0%       25%       50%       75%      100% 
 9.254073  9.922985 11.120381 11.203302 11.550974 
> duration = Quartile$GSM1321375
> quantile(duration)
       0%       25%       50%       75%      100% 
 9.228195 10.572670 10.946407 11.267559 11.492769 
like image 941
user3253470 Avatar asked Oct 20 '22 00:10

user3253470


1 Answers

Find the quartile ranges of your data frame first to get your bins:

quantile(unlist(Quartile))
       0%       25%       50%       75%      100% 
 9.228195 10.229036 10.997555 11.275832 11.800991 

We now have the ranges for each group (i.e 9.228 - 10.229). Then create the quartile data frame:

Quartile[] <- matrix(quantfun(unlist(Quartile)), nrow(Quartile))

We are using the fact that unlist(Quartile) treats the data frame as a vector. If you would like to leave the original data frame intact and use a copy:

Quartile2 <- Quartile
Quartile2[] <- matrix(quantfun(unlist(Quartile2)), nrow(Quartile2))
like image 144
Pierre L Avatar answered Oct 31 '22 21:10

Pierre L