I have a data set with N samples (say, 13, 16, 17, 20) where each next sample is incremented by some value (3, 1, 3 in this case) and I want to find various statistics of the second sequence .
Samples are timestamps that are collected incrementally (i.e. not all samples are available at once), hence I want to use boost::accumulators::accumulator_set
as looks like it's something that would fit the bill.
I want to be able to do something like this:
accumulator_set< double, features< tag::mean > > acc;
...
acc(13);
acc(16);
acc(17);
acc(20);
...BUT sampling the differences instead of the actual values.
How can I do that with accumulator_set
without keeping track of the last value manually?
The boost accumulators do not have a difference statistic. You could roll your own though:
http://www.boost.org/doc/libs/1_37_0/doc/html/accumulators/user_s_guide.html#accumulators.user_s_guide.the_accumulators_framework.extending_the_accumulators_framework
The best solution in my opinion is just to keep track of the last value added though.
This answer may be a bit more involved than you'd like, but at least it's not as outrageous as I was afraid it might turn out. The idea would be to start by creating an iterator type that acts as an adapter from "normal" algorithms to the Boost accumulator style of algorithms. This is the part that turned out a bit simpler than I really expected:
#ifndef ACCUM_ITERATOR_H_INCLUDED
#define ACCUM_ITERATOR_H_INCLUDED
#include <iterator>
template <class Accumulator>
class accum_iterator :
public std::iterator<std::output_iterator_tag,void,void,void,void> {
protected:
Accumulator &accumulator;
public:
typedef Accumulator accumulator_type;
explicit accum_iterator(Accumulator& x) : accumulator(x) {}
// The only part that really does anything: handle assignment by
// calling the accumulator with the value.
accum_iterator<Accumulator>&
operator=(typename Accumulator::sample_type value) {
accumulator(value);
return *this;
}
accum_iterator<Accumulator>& operator*() { return *this; }
accum_iterator<Accumulator>& operator++() { return *this; }
accum_iterator<Accumulator> operator++(int) { return *this; }
};
// A convenience function to create an accum_iterator for a given accumulator.
template <class Accumulator>
accum_iterator<Accumulator> to_accum(Accumulator &accum) {
return accum_iterator<Accumulator>(accum);
}
#endif
Then comes a part that's somewhat unfortunate. The standard library has an adjacent_difference
algorithm that's supposed to produce the stream you want (the differences between adjacent items in a collection). It has one serious problem though: somebody thought it would be useful for it to produce a result collection that was the same size as the input collection (even though there are obviously one more input than result). To do that, adjacent_difference
leaves the first item in the result with some unspecified value, so you have to ignore the first value to get anything useful from it.
To make up for that, I re-implemented an algorithm like std::adjacent_difference
with one oh-so-minor difference: since there are obviously one fewer result than inputs, it only produces one fewer result than inputs, and doesn't give a meaningless, unspecified value in the result. Combining the two, we get:
#include "accum_iterator.h"
#include <iostream>
#include <vector>
#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics/mean.hpp>
using namespace boost::accumulators;
// A re-implementation of std::adjacent_difference, but with sensible outputs.
template <class InIt, class OutIt>
void diffs(InIt in1, InIt in2, OutIt out) {
typename InIt::value_type prev = *in1;
++in1;
while (in1 != in2) {
typename InIt::value_type temp = *in1;
*out++ = temp - prev;
prev = temp;
++in1;
}
}
int main() {
// Create the accumulator.
accumulator_set<double, features< tag::mean > > acc;
// Set up the test values.
std::vector<double> values;
values.push_back(13);
values.push_back(16);
values.push_back(17);
values.push_back(20);
// Use diffs to compute the differences, and feed the results to the
// accumulator via the accum_iterator:
diffs(values.begin(), values.end(), to_accum(acc));
// And print the result from the accumulator:
std::cout << "Mean: " << mean(acc) << std::endl;
return 0;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With