Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to normalize sequence of numbers?

I am working user behavior project. Based on user interaction I have got some data. There is nice sequence which smoothly increases and decreases over the time. But there are little discrepancies, which are very bad. Please refer to graph below:

Plotted sequence

You can also find data here:

2.0789 2.09604 2.11472 2.13414 2.15609 2.17776 2.2021 2.22722 2.25019 2.27304 2.29724 2.31991 2.34285 2.36569 2.38682 2.40634 2.42068 2.43947 2.45099 2.46564 2.48385 2.49747 2.49031 2.51458 2.5149 2.52632 2.54689 2.56077 2.57821 2.57877 2.59104 2.57625 2.55987 2.5694 2.56244 2.56599 2.54696 2.52479 2.50345 2.48306 2.50934 2.4512 2.43586 2.40664 2.38721 2.3816 2.36415 2.33408 2.31225 2.28801 2.26583 2.24054 2.2135 2.19678 2.16366 2.13945 2.11102 2.08389 2.05533 2.02899 2.00373 1.9752 1.94862 1.91982 1.89125 1.86307 1.83539 1.80641 1.77946 1.75333 1.72765 1.70417 1.68106 1.65971 1.64032 1.62386 1.6034 1.5829 1.56022 1.54167 1.53141 1.52329 1.51128 1.52125 1.51127 1.50753 1.51494 1.51777 1.55563 1.56948 1.57866 1.60095 1.61939 1.64399 1.67643 1.70784 1.74259 1.7815 1.81939 1.84942 1.87731 1.89895 1.91676 1.92987

I would want to smooth out this sequence. The technique should be able to eliminate numbers with characteristic of X and Y, i.e. error in mono-increasing or mono-decreasing.

If not eliminate, technique should be able to shift them so that series is not affected by errors.

What I have tried and failed:

  1. I tried to test difference between values. In some special cases it works, but for sequence as presented in this the distance between numbers is not such that I can cut out errors

  2. I tried applying a counter, which is some X, then only change is accepted otherwise point is mapped to previous point only. Here I have great trouble deciding on value of X, because this is based on user-interaction, I am not really controller of it. If user interaction is such that its plot would be a zigzag pattern, I am ending up with 'no user movement data detected at all' situation.

Please share the techniques that you are aware of.

PS: Data made available in this example is a particular case. There is no typical pattern in which numbers are going to occure, but we expect some range to be continuous with all the examples. Solution I am seeking is generic.

like image 266
Adorn Avatar asked Oct 20 '22 20:10

Adorn


1 Answers

I do not know how much effort you want to involve in this problem but if you want theoretical guaranties, topological persistence seems well adapted to your problem imho. Basically with that method, you can filtrate local maximum/minimum by fixing a scale and there are theoritical proofs that says that if you sampling is close from your function, then you extracts correct number of maximums with persistence. You can see these slides (mainly pages 7-9 to get the idea) to get an idea of the method.

Basically, if you take your points as a landscape and imagine a watershed starting from maximum height and decreasing, you have some picks. Every pick has a time where it is born which is the time where it becomes emerged and a time where it dies which is when it merges with an higher pick. Now a persistence diagram pictures a point for every pick where its x/y coordinates are its time of birth/death (by assumption the first pick does not die and is not shown). If a pick is a global maximal, then it will be further from the diagonal in the persistence diagram than a local maximum pick. To remove local maximums you have to remove picks close to the diagonal. There are fours local maximums in your example as you can see with the persistence diagram of your data (thanks for providing the data btw) and two global ones (the first pick is not pictured in a persistence diagram): Persistence diagram of your function

If you noise your data like that : enter image description here

You will still get a very decent persistence diagram that will allow you to filter local maximum as you want :

enter image description here

Please ask if you want more details or references.

like image 58
geoalgo Avatar answered Nov 01 '22 11:11

geoalgo