Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Group similar numbers of a vector

Tags:

r

Let's say that I have the following vector:

c(4, 5, 5, 8, 12, 12, 12, 13, 15, 15, 18, 19, 20, 23, 37, 37, 37, 37, 37, 41)

and I would like to "group" its elements according to their value: numbers that differ <=3 should be considered to belong to the same group. In this case I would like, for each number appearing in the vector, to get all the numbers that are close to it. For example,

4  -->  c(4,5,5,8)
5  -->  c(4,5,5,8)
8  -->  c(5,8)
12 -->  c(12,12,12,13,15,15)

etc

Possibly, it could be useful to get also their index... Is there any smart way to achieve this?

like image 453
Ruggero Avatar asked Jun 16 '15 11:06

Ruggero


3 Answers

You can use this little function:

similar <- function(vec, val, bound = 3, index = F) {
    close.index <- which(abs(vec - val) <= bound)
    if (index) return(close.index)
    return(vec[close.index])
}

x <- c(4, 5, 5, 8, 12, 12, 12, 13, 15, 15, 18, 19, 20, 23, 37, 37, 37, 37, 37, 41)
similar(x, 5)
# [1] 4 5 5 8
similar(x, 5, index = T)
# [1] 1 2 3 4
similar(x, 5, bound = 7)
# [1]  4  5  5  8 12 12 12
like image 118
blakeoft Avatar answered Nov 12 '22 12:11

blakeoft


Perhaps not the most elegant version, but does this do what you wanted to have?

x <- c(4, 5, 5, 8, 12, 12, 12, 13, 15, 15, 18, 19, 20, 23, 37, 37, 37, 37, 37, 41)
vals <- unique(x)
# print indices
for (i in 1:length(vals)) print(which((x >= vals[i] - 3) & (x <= vals[i] + 3)))
# print values
for (i in 1:length(vals)) print(x[which((x >= vals[i] - 3) & (x <= vals[i] + 3))])

[1] 1 2 3
[1] 1 2 3 4
[1] 2 3 4
[1]  5  6  7  8  9 10
[1]  5  6  7  8  9 10
[1]  5  6  7  8  9 10 11
[1]  9 10 11 12 13
[1] 11 12 13
[1] 11 12 13 14
[1] 13 14
[1] 15 16 17 18 19
[1] 20

[1] 4 5 5
[1] 4 5 5 8
[1] 5 5 8
[1] 12 12 12 13 15 15
[1] 12 12 12 13 15 15
[1] 12 12 12 13 15 15 18
[1] 15 15 18 19 20
[1] 18 19 20
[1] 18 19 20 23
[1] 20 23
[1] 37 37 37 37 37
[1] 41

it is indeed a little bit more elegant to use abs.

for (i in 1:length(vals)) print(which(abs(x-vals[i]) <= 3))
for (i in 1:length(vals)) print(x[which(abs(x-vals[i]) <= 3)])
like image 29
Daniel Avatar answered Nov 12 '22 13:11

Daniel


Here is a short solution giving you all the group as a list:

x = c(4, 5, 5, 8, 12, 12, 12, 13, 15, 15, 18, 19, 20, 23, 37, 37, 37, 37, 37, 41)

m = unique(x)
setNames(apply(abs(outer(m,m,'-')), 2, function(u) m[u<=3]),m)

#$`4`
#[1] 4 5

#$`5`
#[1] 4 5 8

#$`8`
#[1] 5 8

#$`12`
#[1] 12 13 15

#$`13`
#[1] 12 13 15

#$`15`
#[1] 12 13 15 18

#$`18`
#[1] 15 18 19 20

#$`19`
#[1] 18 19 20

#$`20`
#[1] 18 19 20 23

#$`23`
#[1] 20 23

#$`37`
#[1] 37

#$`41`
#[1] 41

For the index, the same concept can be applied easily:

setNames(apply(abs(outer(m,m,'-')), 2, function(u) which(x %in% m[u<=3])),m)
like image 2
Colonel Beauvel Avatar answered Nov 12 '22 13:11

Colonel Beauvel