I have a dictionary with data resembling this (dictionary will be have about 100k entries):
[1] -> 5
[7] -> 50
[30] -> 3
[1000] -> 1
[100000] -> 35
I also have a list of ranges (about 1000)
MyRanges
Range
LowerBoundInclusive -> 0
UpperBoundExclusive -> 10
Total
Range
LowerBoundInclusive -> 10
UpperBoundExclusive -> 50
Total
Range
LowerBoundInclusive -> 100
UpperBoundExclusive -> 1000
Total
Range
LowerBoundInclusive -> 1000
UpperBoundExclusive -> 10000
Total
Range (the "other" range)
LowerBoundInclusive -> null
UpperBoundExclusive -> null
Total
I need to calculate the total present in the dictionary for these ranges. For example, the range 0-10 would be 55. These ranges can get really large, so I know it doesn't make sense to just search the dictionary for every value between the two ranges. My hunch is that I should get a list of keys from the dictionary, sort it, then loop through my ranges and do some sort of search to find all the keys within the ranges. Is this the correct way to do this? Is there an easy way to do that?
edit: Thanks for the responses. Real clever stuff. I forgot one pretty important caveat though. There is not the guarantee that the ranges are continuous, and the final range is everything not in the other ranges.
You could do something like that:
// Associate each value with the range of its key
var lookup = dictionary.ToLookup(
kvp => ranges.FirstOrDefault(r => r.LowerBoundInclusive <= kvp.Key
&& r.UpperBoundExclusive > kvp.Key),
kvp => kvp.Value);
// Compute the total of values for each range
foreach (var r in ranges)
{
r.Total = lookup[r].Sum();
}
(note: this solution doesn't take your edit into account; it doesn't handle non-contiguous ranges and the "others" range)
However, it's not very efficient if you have many ranges, since they are enumerated for each entry in the dictionary... You could get better results if you sort the dictionary by key first.
Here's a possible implementation:
// We're going to need finer control over the enumeration than foreach,
// so we manipulate the enumerator directly instead.
using (var dictEnumerator = dictionary.OrderBy(e => e.Key).GetEnumerator())
{
// No point in going any further if the dictionary is empty
if (dictEnumerator.MoveNext())
{
long othersTotal = 0; // total for items that don't fall in any range
// The ranges need to be in ascending order
// We want the "others" range at the end
foreach (var range in ranges.OrderBy(r => r.LowerBoundInclusive ?? int.MaxValue))
{
if (range.LowerBoundInclusive == null && range.UpperBoundExclusive == null)
{
// this is the "others" range: use the precalculated total
// of previous items that didn't fall in any other range
range.Total = othersTotal;
}
else
{
range.Total = 0;
}
int lower = range.LowerBoundInclusive ?? int.MinValue;
int upper = range.UpperBoundExclusive ?? int.MaxValue;
bool endOfDict = false;
var entry = dictEnumerator.Current;
// keys that are below the current range don't belong to any range
// (or they would have been included in the previous range)
while (!endOfDict && entry.Key < lower)
{
othersTotal += entry.Value;
endOfDict = !dictEnumerator.MoveNext();
if (!endOfDict)
entry = dictEnumerator.Current;
}
// while the key in the the range, we keep adding the values
while (!endOfDict && lower <= entry.Key && upper > entry.Key)
{
range.Total += entry.Value;
endOfDict = !dictEnumerator.MoveNext();
if (!endOfDict)
entry = dictEnumerator.Current;
}
if (endOfDict) // No more entries in the dictionary, no need to go further
break;
// the value of the current entry is now outside the range,
// so carry on to the next range
}
}
}
(updated to take your edit into account; works with non-contiguous ranges, and adds items that don't fall in any range to the "others" range)
I didn't run any benchmark, but it's probably pretty fast, since the dictionary and the ranges are enumerated only once.
Obviously, if the ranges are already sorted you don't need the OrderBy
on ranges
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With