Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ accumulator library with ability to remove old samples

In Boost.Accumulator, you can add samples to an accumulator and then extract statistical quantities from it. e.g:

acc(1.)
acc(2.)
acc(3.)
cout << mean; // 2

The library has lots of more complicated statistical quantities, such as skewness, kurtosis, or p_square_cumulative_distribution.

What I'd like to do is something like this:

acc(1.)
acc(2.)
acc(3.)
std::cout << mean(acc); // 2
acc.pop() // withdraw the first value (1.)
std::cout << mean(acc); // 2.5

pop() would work in a FIFO (First In First Out) manner. What I'm trying to do is calculate stats stuffs on my data in an on-line (incremental) fashion within a sliding time-window.

The accumulator would have to internally keep all the values.

I could do my own but I always like to check first for existing libraries, and there may be algorithm I'm not aware of that smartly compute the quantities when data is incoming or outgoing.

like image 664
Arthur Avatar asked Sep 26 '12 00:09

Arthur


3 Answers

Since you mentioned "sliding time window", one option is to use a rolling mean (there is also rolling sum and rolling count), which is the mean of the last N samples. Depending on your needs you could create separate accumulators with different window sizes.

typedef accumulator_set<double,
                stats<tag::rolling_mean>
                > my_accumulator;

my_accumulator acc(tag::rolling_window::window_size = 3);
acc(1.);
acc(2.);
acc(3.);
std::cout << rolling_mean(acc);
// Reset accumulator and use different window size
acc = my_accumulator(tag::rolling_window::window_size = 2);
acc(2.);
acc(3.);
std::cout << rolling_mean(acc);

Also, if you look at the implementation of these, they use boost/circular_buffer.hpp.

like image 118
Jesse Good Avatar answered Nov 14 '22 22:11

Jesse Good


You probably want to store the data in a std::deque instead of a vector, so both your insertion and deletion can have constant complexity. If you use a vector, one will inevitably be linear instead.

Other than that, it's a pretty simple matter of applying an algorithm to the collection. Strangely, however, I don't know of a collection of such algorithms already written and tested, despite seeming like a fairly obvious set of algorithms to have available.

For what it's worth, it is fairly trivial to build an adapter to feed data from a collection into an accumulator to compute the statistic(s) you can about. In a few cases the accumulator probably has to do a bit of extra work to compute results incrementally, but I'd guess it's pretty rare to lose enough efficiency to care about.

like image 39
Jerry Coffin Avatar answered Nov 14 '22 22:11

Jerry Coffin


You'll probably need to just keep all of your samples in a vector, and then accumulate them from the vector for each calculation. Something like this: https://stackoverflow.com/a/7616783/219136

like image 28
PherricOxide Avatar answered Nov 15 '22 00:11

PherricOxide