Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Process for comparing two datasets

I have two datasets at the time (in the form of vectors) and I plot them on the same axis to see how they relate with each other, and I specifically note and look for places where both graphs have a similar shape (i.e places where both have seemingly positive/negative gradient at approximately the same intervals). Example:

enter image description here

So far I have been working through the data graphically but realize that since the amount of the data is so large plotting each time I want to check how two sets correlate graphically it will take far too much time.

Are there any ideas, scripts or functions that might be useful in order to automize this process somewhat?

like image 676
user718531 Avatar asked Jun 21 '11 14:06

user718531


1 Answers

The first thing you have to think about is the nature of the criteria you want to apply to establish the similarity. There is a wide variety of ways to measure similarity and the more precisely you can describe what you want for "similar" to mean in your problem the easiest it will be to implement it regardless of the programming language.

Having said that, here is some of the thing you could look at :

  • correlation of the two datasets
  • difference of the derivative of the datasets (but I don't think it would be robust enough)
  • spectral analysis as mentionned by @thron of three
  • etc. ...

Knowing the origin of the datasets and their variability can also help a lot in formulating robust enough algorithms.

like image 166
Aabaz Avatar answered Sep 21 '22 23:09

Aabaz