Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: splitting dataset into quartiles/deciles. What is the right method? [duplicate]

Tags:

dataframe

plot

r

I am very new with R, so hoping I can get some pointers on how to achieve the desired manipulation of my data.

I have an array of data with three variables.

  gene_id       fpkm  meth_val
1 100629094     0.000 0.0063
2 100628995     0.000 0.0000
3 102655614   111.406 0.0021

I'd like to plot the average meth_val after stratifying my gene_ids based on fpkm into quartiles or deciles.

Once I load my data into a dataframe...

data <- read.delim("myfile.tsv", sep='\t')

I can determine the fpkm deciles using:

quantile(data$fpkm, prob = seq(0, 1, length = 11), type = 5

which yields

          0%          10%          20%          30%          40%          50%
0.000000e+00 9.783032e-01 7.566164e+00 3.667630e+01 1.379986e+02 3.076280e+02
         60%          70%          80%          90%         100%
5.470552e+02 8.875592e+02 1.486200e+03 2.974264e+03 1.958740e+05

From there, I'd like to essentially split the dataframe into 10 groups based on whether the fpkm_val fits into one of these deciles. Then I'd like to plot the meth_val of each decile in ggplot as a box plot and perform a statistical test across deciles.

The main thing I'm really stuck on is how to split my dataset in the proper way. Any assistance would be hugely appreciated!

Thanks a bunch!

like image 252
user1995839 Avatar asked Oct 09 '14 08:10

user1995839


People also ask

How do you divide data into deciles in R?

To place each data value into a decile, we can use the ntile(x, ngroups) function from the dplyr package in R. What is this? The way to interpret the output is as follows: The data value 56 falls between the percentile 0% and 10%, thus it falls in the first decile.

How do you separate quartiles from data?

While data points can be dotted all over a graph at random, to organise them into quartiles you'll need to plot them on a number line. They're listed in ascending order and then divided into four quarters. Quartiles are quite similar to a median, simply dividing the data into four equal parts rather than two.

How do you calculate deciles?

There are several formulae in vogue to calculate decile, and this method is one of the simplest one where each decile is calculated by adding one to the number of data in the population, then divide the sum by ten and then finally multiply the result by the rank of the decile, i.e., 1 for D1, 2 for D2… 9 for D9.


3 Answers

Another way would be ntile() in dplyr.

library(tidyverse)

foo <- data.frame(a = 1:100,
                  b = runif(100, 50, 200),
                  stringsAsFactors = FALSE)

foo %>%
    mutate(quantile = ntile(b, 10))

#  a         b quantile
#1 1  93.94754        2
#2 2 172.51323        8
#3 3  99.79261        3
#4 4  81.55288        2
#5 5 116.59942        5
#6 6 128.75947        6
like image 93
jazzurro Avatar answered Nov 07 '22 09:11

jazzurro


Perhaps easier like this:

data$qunatil = cut( data$fpkm, quantile(data$fpkm, prob = seq(0, 1, length = 11), type = 5) )

like image 30
Adii_ Avatar answered Nov 07 '22 08:11

Adii_


you can try using Hmisc library and cut2 function. You can cut vector into different groups by stating the cutpoints. Here is an example:

library(Hmisc)
data <- data.frame(gene_id=sample(c("A","B","D", 100), 100, replace=TRUE),
               fpkm=abs(rnorm(100, 100, 10)),
               meth_val=abs(rnorm(100, 10, 1)))
quantiles <- quantile(data$fpkm, prob = seq(0, 1, length = 11), type = 5)
data$cutted <- cut2(data$fpkm, cuts = as.numeric(quantiles))

And you will get the same data frame with additional columns for split:

    gene_id      fpkm  meth_val        cutted
1         B 102.16511  8.477469 [100.4,103.2)
2         A 110.59269  9.256172 [106.4,110.9)
3         B  93.15691 10.560936 [ 92.9, 95.3)
4         B 105.74879 10.301358 [103.2,106.4)
5         A  96.12755 11.336484 [ 95.3, 96.8)
6         B 106.29204  8.286120 [103.2,106.4)
...

Moreover you can cut using cut2 specifying by quantiles groups too. Read more ?cut2.

like image 22
adomasb Avatar answered Nov 07 '22 08:11

adomasb