Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to calculate average value over disjoint subranges of STL map

Tags:

c++

map

stl

average

I'm converting an algorithm from C# to C++. A small part of the algorithm is to calculate average values for certain areas in a dictionary.

The data in the dictionary is stored in the following way:

Index     Value
1         10
3         28
290       78
1110      90

I need to calculate the average value of all values with an index smaller than a certain number and all index values larger than a certain number. In C# I do it the following way:

if (dictionary.Where(x => x.Key < areaWidth).Count() > 0)
{
    avgValue = (int) dictionary.Where(x => x.Key < areaWidth).Average(
        x => x.Value);
}

for (var i = 0; i < line.Length; i++)
{
    if (i == areaWidth)
    {
        avgValue = -1;
        i = line.Length - areaWidth;
        var rightBorder = i - areaWidth;

        if (dictionary.Where(x => x.Key > (rightBorder)).Count() > 0)
        {
            avgValue = (int) dictionary.Where(
                x => x.Key > (rightBorder)).Average(
                                x => x.Value);
        }
    }

    if (line[i] < avgValue * 0.8)
    {
        reallyImportantValue += (avgValue - line[i]);
    }
}

I know that is not very efficient and pretty crappy code, but I knew that I would have to completely rewrite this part of the algorithm in C++ anyway, so I decided to implement it quick and dirty.

Anyway I am now porting this to C++ and because it will run on a mobile platform performance is very important. With my limited C++/STL knowledge I could most likely get the job done, but the result would probably be much worse than the C# code.

So if you know a good and efficient way to accomplish this task in C++, please tell me.


EDIT: Thank you for all your answers. As I mentioned in my post my STL knowledge is limited, so it's really hard for me to pick a solution, especially since there are a lot of different opinions. It would be great if someone could help me with the decision, by comparing the solutions posted here. To give you a little more background information:

The function will be called approximately 500 times with 1000 values in the map. The most important aspect is stability, performance is the second most important.

like image 744
xsl Avatar asked Nov 03 '10 17:11

xsl


2 Answers

You can use std::accumulate to compute the sum of the values, and then divide by the number of elements. Here are some examples of how to compute the mean and other statistics using STL.

like image 195
Dima Avatar answered Nov 17 '22 12:11

Dima


EDIT: one-pass map accumulator - result2 contains the info you need:

#include <map>
#include <algorithm>
#include <numeric>

typedef map<const unsigned int, unsigned int> Values;

struct averageMap
{
    averageMap() : lowerCount(0), lowerSum(0), upperSum(0) {}
    averageMap operator()(const averageMap& input, 
           const Values::value_type& current)
    {
        if (current.first > boundary)
        {
            upperSum += current.second;
        }
        else
        {
            lowerSum += current.second;
            ++lowerCount;
        }
        return *this;
    }

    static size_t boundary;
    size_t lowerCount;
    unsigned int lowerSum;
    unsigned int upperSum;
};

size_t averageMap::boundary(0);

struct averageRange
{
    averageRange() : count(0), sum(0) {}
    averageRange operator()(const averageRange& input, 
        const Values::value_type& current)
    {
        sum += current.second;
        ++count;

        return *this;
    }

    size_t count;
    unsigned int sum;
};


int main()
{
    Values values;

    values[1] = 10;
    values[3] = 28;
    values[290] = 78;
    values[1110] = 110;

    averageMap::boundary = 100;
    averageMap result = accumulate(values.begin(), values.end(), 
        averageMap(boundary), averageMap(boundary));

averageRange result2 = accumulate(values.lower_bound(2), values.upper_bound(300), 
    averageRange(), averageRange());

    return 0;
};

OLD VERSION:

This works for me. Using accumulate on range retrieved from map::upper_bound was problematic because many STL operations require final iterators to be reachable from first in range. There is a bit of a cheat here - assuming the map values are >= 0.

#include <map>
#include <algorithm>
#include <numeric>
#include <vector>

using namespace std;

typedef map<unsigned int, unsigned int> Values;

int main()
{
    Values values;

    values[1] = 10;
    values[3] = 28;
    values[290] = 78;
    values[1110] = 110;

    size_t boundary(100);
    Values::iterator iter = values.upper_bound(boundary);

    vector<int> lowerRange(values.size(), -1);

    transform(values.begin(), iter, lowerRange.begin(), 
        [](std::pair<unsigned int, unsigned int> p) 
                -> int { return p.second; });

    vector<int>::iterator invalid(find(lowerRange.begin(), 
        lowerRange.end(), -1));
    size_t lowerCount(distance(lowerRange.begin(), invalid));
    lowerRange.resize(lowerCount);

    vector<int> upperRange(values.size() - lowerCount);
    transform(iter, values.end(), upperRange.begin(), 
        [](std::pair<unsigned int, unsigned int> p) 
                -> int { return p.second; });

    size_t lowerAverage = accumulate(lowerRange.begin(), 
        lowerRange.end(), 0) / lowerRange.size();
    size_t upperAverage = accumulate(upperRange.begin(), 
        upperRange.end(), 0) / upperRange.size();

    return 0;
};
like image 3
Steve Townsend Avatar answered Nov 17 '22 12:11

Steve Townsend