Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

split a vector by percentile

Tags:

split

r

vector

I need to split a sorted unknown length vector in R into "top 10%,..., bottom 10%" So, for example if I have vector <- order(c(1:98928)), I want to split it into 10 different vectors, each one representing approximately 10% of the total length.

Ive tried using split <- split(vector, 1:10) but as I dont know the length of the vector, I get this error if its not multiple

data length is not a multiple of split variable

And even if its multiple and the function works, split() does not keep the order of my original vector. This is what split gives:

split(c(1:10) , 1:2)
$`1`
[1] 1 3 5 7 9

$`2`
[1]  2  4  6  8 10

And this is what I want:

$`1`
[1] 1 2 3 4 5

$`2`
[1]  6  7  8  9 10

Im newbie in R and Ive been trying lots of things without success, does anyone knows how to do this?

like image 758
Akiru Avatar asked Dec 04 '22 00:12

Akiru


1 Answers

Problem statement

Break a sorted vector x every 10% into 10 chunks.

Note there are two interpretation for this:

  1. Cutting by vector index:

    split(x, floor(10 * seq.int(0, length(x) - 1) / length(x)))
    
  2. Cutting by vector values (say, quantiles):

    split(x, cut(x, quantile(x, prob = 0:10 / 10, names = FALSE), include = TRUE))
    

In the following, I will make demonstration using data:

set.seed(0); x <- sort(round(rnorm(23),1))

Particularly, our example data are Normally distributed rather than uniformly distributed, so cutting by index and cutting by value are substantially different.

Result

cutting by index

#$`0`
#[1] -1.5 -1.2 -1.1
#
#$`1`
#[1] -0.9 -0.9
#
#$`2`
#[1] -0.8 -0.4
#
#$`3`
#[1] -0.3 -0.3 -0.3
#
#$`4`
#[1] -0.3 -0.2
#
#$`5`
#[1] 0.0 0.1
#
#$`6`
#[1] 0.3 0.4 0.4
#
#$`7`
#[1] 0.4 0.8
#
#$`8`
#[1] 1.3 1.3
#
#$`9`
#[1] 1.3 2.4

cutting by quantile

#$`[-1.5,-1.06]`
#[1] -1.5 -1.2 -1.1
#
#$`(-1.06,-0.86]`
#[1] -0.9 -0.9
#
#$`(-0.86,-0.34]`
#[1] -0.8 -0.4
#
#$`(-0.34,-0.3]`
#[1] -0.3 -0.3 -0.3 -0.3
#
#$`(-0.3,-0.2]`
#[1] -0.2
#
#$`(-0.2,0.14]`
#[1] 0.0 0.1
#
#$`(0.14,0.4]`
#[1] 0.3 0.4 0.4 0.4
#
#$`(0.4,0.64]`
#numeric(0)
#
#$`(0.64,1.3]`
#[1] 0.8 1.3 1.3 1.3
#
#$`(1.3,2.4]`
#[1] 2.4
like image 138
Zheyuan Li Avatar answered Dec 20 '22 16:12

Zheyuan Li