Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I search a c# dictionary using a range of keys?

I have a dictionary with data resembling this (dictionary will be have about 100k entries):

[1] -> 5
[7] -> 50
[30] -> 3
[1000] -> 1
[100000] -> 35

I also have a list of ranges (about 1000)

MyRanges
    Range
        LowerBoundInclusive -> 0
        UpperBoundExclusive -> 10
        Total
    Range
        LowerBoundInclusive -> 10
        UpperBoundExclusive -> 50
        Total
    Range
        LowerBoundInclusive -> 100
        UpperBoundExclusive -> 1000
        Total
    Range
        LowerBoundInclusive -> 1000
        UpperBoundExclusive -> 10000
        Total
    Range (the "other" range)
        LowerBoundInclusive -> null
        UpperBoundExclusive -> null
        Total

I need to calculate the total present in the dictionary for these ranges. For example, the range 0-10 would be 55. These ranges can get really large, so I know it doesn't make sense to just search the dictionary for every value between the two ranges. My hunch is that I should get a list of keys from the dictionary, sort it, then loop through my ranges and do some sort of search to find all the keys within the ranges. Is this the correct way to do this? Is there an easy way to do that?

edit: Thanks for the responses. Real clever stuff. I forgot one pretty important caveat though. There is not the guarantee that the ranges are continuous, and the final range is everything not in the other ranges.

like image 399
user3715648 Avatar asked Jan 11 '23 11:01

user3715648


1 Answers

You could do something like that:

// Associate each value with the range of its key
var lookup = dictionary.ToLookup(
    kvp => ranges.FirstOrDefault(r => r.LowerBoundInclusive <= kvp.Key
                              && r.UpperBoundExclusive > kvp.Key),
    kvp => kvp.Value);

// Compute the total of values for each range
foreach (var r in ranges)
{
    r.Total = lookup[r].Sum();
}

(note: this solution doesn't take your edit into account; it doesn't handle non-contiguous ranges and the "others" range)

However, it's not very efficient if you have many ranges, since they are enumerated for each entry in the dictionary... You could get better results if you sort the dictionary by key first.

Here's a possible implementation:

// We're going to need finer control over the enumeration than foreach,
// so we manipulate the enumerator directly instead.
using (var dictEnumerator = dictionary.OrderBy(e => e.Key).GetEnumerator())
{
    // No point in going any further if the dictionary is empty
    if (dictEnumerator.MoveNext())
    {
        long othersTotal = 0; // total for items that don't fall in any range

        // The ranges need to be in ascending order
        // We want the "others" range at the end
        foreach (var range in ranges.OrderBy(r => r.LowerBoundInclusive ?? int.MaxValue))
        {
            if (range.LowerBoundInclusive == null && range.UpperBoundExclusive == null)
            {
                // this is the "others" range: use the precalculated total
                // of previous items that didn't fall in any other range
                range.Total = othersTotal;
            }
            else
            {
                range.Total = 0;
            }

            int lower = range.LowerBoundInclusive ?? int.MinValue;
            int upper = range.UpperBoundExclusive ?? int.MaxValue;

            bool endOfDict = false;
            var entry = dictEnumerator.Current;


            // keys that are below the current range don't belong to any range
            // (or they would have been included in the previous range)
            while (!endOfDict && entry.Key < lower)
            {
                othersTotal += entry.Value;
                endOfDict = !dictEnumerator.MoveNext();
                if (!endOfDict)
                    entry = dictEnumerator.Current;
            }

            // while the key in the the range, we keep adding the values
            while (!endOfDict  && lower <= entry.Key && upper > entry.Key)
            {
                range.Total += entry.Value;
                endOfDict = !dictEnumerator.MoveNext();
                if (!endOfDict)
                    entry = dictEnumerator.Current;
            }

            if (endOfDict) // No more entries in the dictionary, no need to go further
                break;

            // the value of the current entry is now outside the range,
            // so carry on to the next range
        }
    }
}

(updated to take your edit into account; works with non-contiguous ranges, and adds items that don't fall in any range to the "others" range)

I didn't run any benchmark, but it's probably pretty fast, since the dictionary and the ranges are enumerated only once.

Obviously, if the ranges are already sorted you don't need the OrderBy on ranges.

like image 162
Thomas Levesque Avatar answered Jan 25 '23 23:01

Thomas Levesque