Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Map from integer ranges to arbitrary single integers

Tags:

c++

algorithm

Working in C++ in a Linux environment, I have a situation where a number of integer ranges are defined, and integer inputs map to different arbitrary integers based on which range they fall into. None of the ranges overlap, and they aren't always contiguous.

The "simplest" way to solve this problem is with a bunch of if-statements for each range, but the number of ranges, their bounds, and the target values can all vary, so if-statements aren't maintainable.

For example, the ranges might be [0, 70], called r_a, [101, 150], call it r_b, and [201, 400], call it r_c. Inputs in r_a map to 1, in r_b map to 2, and r_c map to 3. Anything not in r_a, r_b, r_c maps to 0.

I can come up with a data structure & algorithm that stores tuples of (bounds, map target) and iterates through them, so finding the target value takes linear time in the number of bounds pairs. I can also imagine a scheme that keeps the pairs ordered and uses a binary sort-ish algorithm against all the lower bounds (or upper bounds), finds the closest to the input, then compares against the opposing bound.

Is there a better way to accomplish the mapping than a binary-search based algorithm? Even better, is there some C++ library out that does this already?

like image 371
Ian Durkan Avatar asked Feb 04 '10 18:02

Ian Durkan


3 Answers

The best approach here is indeed a binary search, but any efficient order-based search will do perfectly well. You don't really have to implement the search and the data structure explicitly. You can use it indirectly by employing a standard associative container instead.

Since your ranges don't overlap, the solution is very simple. You can immediately use a std::map for this problem to solve it in just a few lines of code.

For example, this is one possible approach. Let's assume that we are mapping an [ int, int ] range to an int value. Let's represent our ranges as closed-open ranges, i.e. if the original range is [0, 70], let's consider a [0, 71) range instead. Also, let's use the value of 0 as a "reserved" value that means "no mapping" (as you requested in your question)

const int EMPTY = 0;

All you need to do is to declare a map from int to int:

typedef std::map<int, int> Map;
Map map;

and fill it with each end of your closed-open ranges. The left (closed) end should be mapped to the desired value the entire range is mapped to, while the right (open) end should be mapped to our EMPTY value. For your example, it will look as follows

map[0] = r_a;
map[71] = EMPTY;

map[101] = r_b;
map[251] = EMPTY;

map[260] = r_c; // 260 adjusted from 201
map[401] = EMPTY;

(I adjusted your last range, since in your original example it overlapped the previous range, and you said that your ranges don't overlap).

That's it for initialization.

Now, in order to determine where a given value of i maps to all you need to do is

Map::iterator it = map.upper_bound(i);

If it == map.begin(), then i is not in any range. Otherwise, do

--it;

If the it->second (for the decremented it) is EMPTY, then i is not in any range.

The combined "miss" check might look as follows

Map::iterator it = map.upper_bound(i);
if (it == map.begin() || (--it)->second == EMPTY)
  /* Missed all ranges */;

Otherwise, it->second (for the decremented it) is your mapped value

int mapped_to = it->second;

Note that if the original ranges were "touching", as in [40, 60] and [61, 100], then the closed-open ranges will look as [40, 61) and [61, 101) meaning that the value of 61 will be mapped twice during map initialization. In this case it is important to make sure that the value of 61 is mapped to the proper destination value and not to the value of EMPTY. If you map the ranges as shown above in the left-to-right (i.e. increasing) order it will work correctly by itself.

Note, that only the endpoints of the ranges are inserted into the map, meaning that the memory consumption and the performance of the search depends only on the total number of ranges and completely independent of their total length.


If you wish, you can add a "guard" element to the map during the initialization

map[INT_MIN] = EMPTY;

(it corresponds to "negative infinity") and the "miss" check will become simpler

Map::iterator it = map.upper_bound(i);

assert(it != map.begin());
if ((--it)->second == EMPTY)
  /* Missed all ranges */;

but that's just a matter of personal preference.

Of course, if you just want to return 0 for non-mapped values, you don't need to carry out any checking at all. Just take the it->second from the decremented iterator and you are done.

like image 156
danny117 Avatar answered Nov 16 '22 07:11

danny117


I would use a very simple thing: a std::map.

class Range
{
public:
  explicit Range(int item);  // [item,item]
  Range(int low, int high);  // [low,high]

  bool operator<(const Range& rhs) const
  {
    if (mLow < rhs.mLow)
    {
      assert(mHigh < rhs.mLow); // sanity check
      return true;
    }
    return false;
  } // operator<

  int low() const { return mLow; }
  int high() const { return mHigh; }

private:
  int mLow;
  int mHigh;
}; // class Range

Then, let's have a map:

typedef std::map<Range, int> ranges_type;

And write a function that search in this map:

int find(int item, const ranges_type& ranges)
{
  ranges_type::const_iterator it = ranges.lower_bound(Range(item));
  if (it != ranges.end() && it->first.low() <= item)
    return it->second;
  else
    return 0; // No mapping ?
}

Main benefits:

  • Will check that the ranges effectively don't overlap during insertion in the set (you can make it so that it's only in debug mode)
  • Supports edition of the Ranges on the fly
  • Finding is fast (binary search)

If the ranges are frozen (even if their values are not), you may wish to use Loki::AssocVector to reduce the memory overhead and improve performance a bit (basically, it's a sorted vector with the interface of a map).

like image 32
Matthieu M. Avatar answered Nov 16 '22 06:11

Matthieu M.


Wouldn't a simple array be enough? You're not saying how many items you have, but by far the fastest data structure is a simple array.

If the ranges are:

  • 0..9 --> 25
  • 10..19 --> 42

Then the array would simply be like this:

[25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42]
like image 2
Lasse V. Karlsen Avatar answered Nov 16 '22 06:11

Lasse V. Karlsen