Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

two way ranged lookup table, C#

I need to take a design decision on some data structure for extremely quick access. Here the scenario : I have to sync two variables of different growth rate. I have tabulated data of in the following format:

Range( Ai1, Ai2) ~ Range ( Bi1, Bi2) That is to say that the range Ai1 - Ai2 matches Bi1 - Bi2 for some i

Now given any Ax in the entire span of A I should be able to determine the appropriate range in (Bj1, Bj2) and vice-versa. Data type wise : A is int; while B is float.

I don't know what would be the most appropriate data type for this translation ? My primary requirement is speed. Also any help in how this data struct can be implemented in C# would be helpful.

The problem is assured to fit in memory. Span of A can be approx the range 0 - 300,000, and the size of range Ai1-Ai2 can be some where from 10 to 300 ; while the span of span of the float is 0 to 10,000.000 ( we use only 3 decimal places) and the size of range Bi1 - Bi2 can be something like 0.100 - 10.000

Another known fact is that A is assured to be continuous while B may not be. But both increase sort of simultaneously, but at varying rates.Also neither Ranges overlap. Both are monotonically increasing.

So something like this can be expected:

(Ai1, Ai2) ~ ( Bi1, Bi2)

(1,78) ~ (13.454, 19.546)

(79,114) ~ (19.712,22.335)

(115, 198) ~ ( 22.678, 24.101)

query: A = 99 , Expected Response: B range = (19.712,22.335)

query: B = 16.117 , Expected Response: A range= (1,78)

In case B is not in range forward roundoff is expected.

Thnx-Egon

like image 533
Egon Avatar asked Mar 09 '11 16:03

Egon


2 Answers

Consider this general approach:

  1. Define ARange and BRange; and point them to each other:

    class ARange
    {
        public int Low;
        public int High;
        public BRange B;
    }
    
    class BRange
    {
        public float Low;
        public float High;
        public ARange A;
    }
    
  2. Construct pairs of ARange and BRange classes by a factory method that interconnects both instances.

  3. Store ARanges and BRanges in two sorted arrays.
  4. Having a particular a or b value, use binary search to lookup the covering ARange or BRange respectively, and retrieve the interconnected opposite range.

Binary search will give you an O(log N) lookup complexity in the worst case, where N is the length of ARange and BRange arrays respectively. This particular weakly-typed Array.BinarySearch overload can give you a kickstart.

If you need a general purpose solution with good readability, you can overload comparison operations for pairs of (int, ARange) and (float, BRange).

After this algorithm is implemented, consider these optimizations:

  • define ARange and BRange as structs to reduce the amount of dynamically allocated memory, improve locality of data, and reduce overhead;
  • providing ARanges form a continuous sequence (i.e. without gaps), optimize out High, and keep pairs of Low, B and an upper bound delimiting the sequence (say, as an artificial element in the array);
  • provide a custom binary search implementation which will allow you to compare ints/floats with ARanges/BRanges;
  • another option to increase locality of data (and thus reduce CPU cache misses) is to break-up the classes into arrays of individual fields, hence in the binary search you can work with just the Low bounds, and access High and A/B for specific items only:

    int[] ALow;    // Lows of A-ranges
    int[] AHigh;   // Highs of A-ranges
    int[] AB;      // index into B-arrays from A-ranges
    
    float[] BLow;  // Lows of B-ranges
    float[] BHigh; // Highs of B-ranges
    int[] BA;      // index into A-arrays from B-ranges
    
like image 186
Ondrej Tucny Avatar answered Oct 21 '22 18:10

Ondrej Tucny


The crucial properties of your data is that for both A and the associated B, the ranges:

  1. Don't overlap.
  2. Increase monotonically.

These mean that a simple binary search should work effectively, and provide you with O(log(n)) lookup.

Store an array of the interval-pairs in ascending order.

To perform a lookup (on, say, A), run a binary search on the start property of the "key" interval (in this case, A) to find the highest interval whose start is smaller than the item to search. Then check if that interval contains the item (end >= toSearch). The "value" (in this case, the associated B interval) is trivial to extract - it's part of the same array-element.

A reverse lookup (i.e. from B to A) works in pretty much the same way.

like image 34
Ani Avatar answered Oct 21 '22 20:10

Ani