Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast algorithm to find elements in time range of list

Tags:

java

arrays

list

The Problem: I have a list of data with time and value (time = long milliSec and double value). I now need to calculate several averages within different time ranges.
I get up to 50 values per second but sometimes only a few values and need to hold the last 10 seconds, so 500 values.

What I want is: calculate average of values where time >= begin and time <= end.

I can ensure no time is double, so it could be used as a key.

Currently I use an array to store the values and have a position marker which gets reset to 0 once 500 is reached, so old entries are recyled. I can change that easily.

I was not sure what the fastest approach would be, e.g. manual search of array or using a list, hashMap, Collection (with comparator?) or else. I could not find a (java) list-like function where I hava a build-in search either "key >= x" or "value >=x".

Performance is more important than nice or easy coding.

Would be nice to be pointed in the right direction.

I compute the average of the last 10 values each time a new value comes in, that is 30-50 calculations per second alone and is the most important data. I need to distinguish small errors in measurement from actual changes. I additional calculate the average of each 1/10th of a second (this may be dropped) and finally the average of a second and the average of last 10 seconds. That are additional 12 average calculation per second. Reducing the number of calculations is not really an option.

As this is a bit abstract, here is an example of what the data looks like (where avg is calculated of last 10 values, but that is not the program logic).

value           Avg timeReading timeReadingISO
1024,6668701172 -       1385408750828   2013-11-25 19:45:50
1024,6668701172 -       1385408751350   2013-11-25 19:45:51
1024,6668701172 -       1385408751859   2013-11-25 19:45:51
1024,6683349609 -       1385408752373   2013-11-25 19:45:52
1024,6683349609 -       1385408752878   2013-11-25 19:45:52
1024,6689453125 -       1385408753385   2013-11-25 19:45:53
1024,6689453125 -       1385408753895   2013-11-25 19:45:53
1024,6721191406 -       1385408754406   2013-11-25 19:45:54
1024,6721191406 -       1385408754912   2013-11-25 19:45:54
1024,6774902344 -       1385408755432   2013-11-25 19:45:55
1024,6774902344 1024,67 1385408755994   2013-11-25 19:45:55
1024,6774902344 1024,67 1385408756502   2013-11-25 19:45:56
1024,6837158203 1024,67 1385408757012   2013-11-25 19:45:57
1024,6837158203 1024,67 1385408757520   2013-11-25 19:45:57
1024,689453125  1024,68 1385408758028   2013-11-25 19:45:58
1024,689453125  1024,68 1385408758536   2013-11-25 19:45:58
1024,6938476563 1024,68 1385408759055   2013-11-25 19:45:59
1024,6938476563 1024,68 1385408759560   2013-11-25 19:45:59
1024,6990966797 1024,68 1385408760075   2013-11-25 19:46:00
1024,6990966797 1024,69 1385408760579   2013-11-25 19:46:00
1024,7038574219 1024,69 1385408761086   2013-11-25 19:46:01
1024,7038574219 1024,69 1385408761596   2013-11-25 19:46:01
1024,7111816406 1024,69 1385408762103   2013-11-25 19:46:02
1024,7111816406 1024,70 1385408762606   2013-11-25 19:46:02
1024,7111816406 1024,70 1385408763112   2013-11-25 19:46:03
1024,7111816406 1024,70 1385408763622   2013-11-25 19:46:03
1024,7172851563 1024,70 1385408764128   2013-11-25 19:46:04
1024,7172851563 1024,71 1385408764637   2013-11-25 19:46:04
1024,7208251953 1024,71 1385408765149   2013-11-25 19:46:05

1026,5457763672 -       1385474621756   2013-11-26 14:03:41
1026,6057128906 -       1385474621790   2013-11-26 14:03:41
1026,6257324219 -       1385474621823   2013-11-26 14:03:41
1026,6057128906 -       1385474621858   2013-11-26 14:03:41
1026,6257324219 -       1385474621890   2013-11-26 14:03:41
1026,6257324219 -       1385474621921   2013-11-26 14:03:41
1026,6057128906 -       1385474621956   2013-11-26 14:03:41
1026,5457763672 -       1385474621988   2013-11-26 14:03:41
1026,6557617188 -       1385474622022   2013-11-26 14:03:42
1026,6657714844 -       1385474622057   2013-11-26 14:03:42
1026,6257324219 1026,61 1385474622090   2013-11-26 14:03:42
1026,6057128906 1026,62 1385474622123   2013-11-26 14:03:42
1026,6657714844 1026,62 1385474622159   2013-11-26 14:03:42
1026,6557617188 1026,62 1385474622193   2013-11-26 14:03:42
1026,6557617188 1026,63 1385474622227   2013-11-26 14:03:42
1026,6257324219 1026,63 1385474622260   2013-11-26 14:03:42
1026,6257324219 1026,63 1385474622298   2013-11-26 14:03:42
1026,6557617188 1026,63 1385474622330   2013-11-26 14:03:42
1026,6257324219 1026,64 1385474622365   2013-11-26 14:03:42
1026,6257324219 1026,64 1385474622401   2013-11-26 14:03:42
1026,6257324219 1026,64 1385474622431   2013-11-26 14:03:42
1026,5758056641 1026,64 1385474622466   2013-11-26 14:03:42
1026,6057128906 1026,63 1385474622501   2013-11-26 14:03:42
1026,5457763672 1026,63 1385474622533   2013-11-26 14:03:42
1026,5457763672 1026,62 1385474622565   2013-11-26 14:03:42
1026,6057128906 1026,61 1385474622599   2013-11-26 14:03:42
1026,6057128906 1026,60 1385474622631   2013-11-26 14:03:42
1026,5758056641 1026,60 1385474622665   2013-11-26 14:03:42
1026,5457763672 1026,59 1385474622702   2013-11-26 14:03:42
1026,6057128906 1026,59 1385474622734   2013-11-26 14:03:42
1026,6557617188 1026,58 1385474622766   2013-11-26 14:03:42
1026,5758056641 1026,59 1385474622800   2013-11-26 14:03:42
1026,6057128906 1026,59 1385474622836   2013-11-26 14:03:42
1026,6057128906 1026,59 1385474622868   2013-11-26 14:03:42
1026,5158691406 1026,59 1385474622901   2013-11-26 14:03:42
1026,5457763672 1026,59 1385474622935   2013-11-26 14:03:42
1026,6856689453 1026,58 1385474622966   2013-11-26 14:03:42
like image 294
Gunnar Bernstein Avatar asked Nov 11 '22 17:11

Gunnar Bernstein


1 Answers

First of all, when calculating average you should create a copy of the structure (or use one that is threadsafe and traversing it during addition or removal would not cause pain) unless you do everything in one thread.

I guess that elements in your collection are aready sorted since you sequentially receive updates (if not look for equivalent of sorted Lists).

My approach would be to select the smallest interval of your average measurement. Let's say 10 values. Then you can create 50 collections (of size 10) where every one of them were of class which delivers you method to calculate average. Then you can select which average you want to count. Just count average of collections averages sum. What is more - once calculated average for given collection won't change so you can cache it

Note that you don't have to transfer value from one collection to another since your minimal interval is already handled. If new 10 elements come to the buffer you just can just reassign references.

/* initializing */
MySlicedCollection buffer = new MySlicedCollection();
MySlicedCollection[] mscArray = new MySlicedCollection[50];

/* when every 10 values came in */
for(int i = mscArray.length-1; i > 0 ; --i) {
    mscArray[i] = mscArray[i-1];
}
mscArray[0] = buffer;
buffer = new MySlicedCollection();

/* avg of all collection */
for(MySlicedCollection msc : mscArray) {
    sum += msc.getAverage();
}
sum /= 50;

You should also think of counting averages using previous results. If you have to count avg for 1sec and 2sec then you can just add remaining average to already counted avg for one second and divide it by 2.

/* avg for one second */
for(int i = 0; i < 5; ++i) {
    sumOneSec += mscArray[i].getAverage();
}
sumOneSec /= 5;

/* avg for two seconds */
for(int i = 5; i < 10; ++i) {
    sumTwoSec += mscArray[i].getAverage();
}
sumTwoSec = ((sumTwoSec/5) + sumOneSec) / 2;

But remember what someone said: "First measure then act" - maybe your performance is sufficent?


Update By using Circular buffer and doing simple trick you can save at least one iteration. Provided that the buffer is full (and its capacity is 50), there is known average and another value comes in - you can simply recalculate it by doing calculation
avg = (avg * 50 - oldestValue + newValue)/50;

unfortunately this will introduce a little mistake to your computation due to floating point variables finite representation, but since you're using double values I think you do not need such precision. Similar solution can provided to another averages but this will require more thinking :)

like image 139
Maciej Dobrowolski Avatar answered Nov 15 '22 06:11

Maciej Dobrowolski