Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clustering values with given threshold

I have several vectors:

a <- c(1.1, 2.9, 3.9, 5.2)
b <- c(1.0, 1.9, 4.0, 5.1)
c <- c(0.9, 2.1, 3.1, 4.1, 5.0, 11.13)

They can have different length.

I want to combine them in such a way to obtain a general vector with averaged values if there are similar meanings in all of the vectors or in any pairs of them; and with an initial meanings if this meaning is only in one vector. For averaging I would like to use a threshold = 0.2.

My explanation could be a bit confusing, but here is the general vector I want to obtain:

d <- c(1, 2, 3, 4, 5.1, 11.13)

I have around 12 vectors and about 2000 values in each vector.

I will be glad for any help

like image 588
MarinaZav Avatar asked Feb 20 '26 01:02

MarinaZav


1 Answers

Seems like a clustering problem (clustered by distance). You can try the code below

library(igraph)

v <- sort(c(a, b, c))

tapply(
    v,
    membership(components(graph_from_adjacency_matrix(as.matrix(dist(v)) <= 0.2 + sqrt(.Machine$double.eps)))),
    mean
)

which gives

    1     2     3     4     5     6
 1.00  2.00  3.00  4.00  5.10 11.13
like image 132
ThomasIsCoding Avatar answered Feb 22 '26 20:02

ThomasIsCoding



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!