Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare two user defined curves and score their similarity

I have a set of 2 curves (each with a few hundreds to a couple thousands datapoints) that I want to compare and get some similarity "score". Actually, I have >100 of those sets to compare... I am familiar with R (or at least bioconductor) and would like to use it.

I tried the ccf() function but I'm not too happy about it.

For example, if I compare c1 to the following curves:

c1 <- c(0, 0.8, 0.9, 0.9, 0.5, 0.1, 0.5)

c1b <- c(0, 0.8, 0.9, 0.9, 0.5, 0.1, 0.5) # perfect match! ideally score of 1

c1c <- c(1, 0.2, 0.1, 0.1, 0.5, 0.9, 0.5) # total opposite, ideally score of -1? (what would 0 be though?)

c2 <- c(0, 0.9, 0.9, 0.9, 0, 0.3, 0.3, 0.9) #pretty good, score of ???

Note that the vectors don't have the same size and it needs to be normalized, somehow... Any idea? If you look at those 2 lines, they are fairly similar and I think that in a first step, measuring the area under the 2 curves and subtracting would do. I look at the post "Shaded area under 2 curves in R" but that is not quite what I need.

A second issue (optional) is that for lines that have the same profile but different amplitude, I would like to score those as very similar even though the area under them would be big:

c1 <- c(0, 0.8, 0.9, 0.9, 0.5, 0.1, 0.5)

c4 <- c(0, 0.6, 0.7, 0.7, 0.3, 0.1, 0.3) # very good, score of ??

I hope that a biologist pretending to formulate problem to programmer is OK...

I'd be happy to provide some real life examples if needed.

Thanks in advance!

like image 940
Olivier Avatar asked Jan 20 '26 17:01

Olivier


1 Answers

They don't form curves in the usual meaning of paired x.y values unless they are of equal length. The first three are of equal length and after packaging in a matrix the rcorr function in HMisc package returns:

> rcorr(as.matrix(dfrm))[[1]]
    c1 c1b c1c
c1   1   1  -1
c1b  1   1  -1
c1c -1  -1   1   # as desired if you scaled them to 0-1

The correlation of the c1 and c4 vectors:

> cor( c(0, 0.8, 0.9, 0.9, 0.5, 0.1, 0.5),
  c(0, 0.6, 0.7, 0.7, 0.3, 0.1, 0.3) )
[1] 0.9874975
like image 151
IRTFM Avatar answered Jan 23 '26 08:01

IRTFM