Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding "best matching key" for a given key in a sorted STL container

Problem

I have timestamped data, which I need to search based on the timestamp in order to get the one existing timestamp which matches my input timestamp the closest.
Preferably this should be solved with the STL. boost::* or stl::tr1::* (from VS9 with Featurepack) are also possible.
Example of timestamped data:

struct STimestampedData
{
 time_t m_timestamp; // Sorting criterion
 CData m_data;       // Payload
}

Approach with stl::vector, sort() and equal_range()

Since a map or set only allows me to find exact matches, I don't get any further using one of these. So now I have a vector to which I append data as it is coming in. Before searching I use <algorithm>'s sort() and supply it with a custom comparison function.
After that I use <algorithm>'s equal_range() to find the two neighbors of a specified value x. From these two values I check which one is closest to x and then I have my best match.


While this is not too complex, I wonder if there are more elegant solutions to this.
Maybe the STL already has an algorithm which does exactly that so I'm not re-inventing something here?

Update: Linear vs. binary search

I forgot to mention that I have quite a lot of data to handle so I don't want to have to search linearly.
The reason I am sorting a vector with sort() is because it has random access iterators which is not the case with a map. Using a map would not allow equal_range() to do a search with twice logarithmic complexity.
Am I correct?

like image 727
foraidt Avatar asked Oct 20 '08 13:10

foraidt


2 Answers

I would use set::lower_bound to find the matching or greater value, then decrement the iterator to check the next lower value. You should use std::set rather than std::map since your key is embedded in the object - you'll need to provide a functor that compares the timestamp members.

struct TimestampCompare
{
    bool operator()(const STimestampedData & left, const STimestampedData & right) const
    {
        return left.m_timestamp < right.m_timestamp;
    }
};
typedef std::set<STimestampedData,TimestampCompare> TimestampedDataSet;

TimestampedDataSet::iterator FindClosest(TimestampedDataSet & data, STimestampedData & searchkey)
{
    if (data.empty())
        return data.end();
    TimestampedDataSet::iterator upper = data.lower_bound(searchkey);
    if (upper == data.end())
        return --upper;
    if (upper == data.begin() || upper->m_timestamp == searchkey.m_timestamp)
        return upper;
    TimestampedDataSet::iterator lower = upper;
    --lower;
    if ((searchkey.m_timestamp - lower->m_timestamp) < (upper->m_timestamp - searchkey.m_timestamp))
        return lower;
    return upper;
}
like image 144
Mark Ransom Avatar answered Nov 08 '22 03:11

Mark Ransom


I would use equal_range too for such a thing.

If you are using sort() every time on your vector it might be better to use a map (or set), as that's always sorted automatically, and use the member equal_range

But that depends on the the amount of inserts / queries / amount of data. (although for something that always needs to be sorted when I query, a map would be my first choice, and I'd only use a vector if there was a very good reason)

like image 34
Pieter Avatar answered Nov 08 '22 04:11

Pieter