Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare Two Ordered Lists in C#

The problem is that I have two lists of strings. One list is an approximation of the other list, and I need some way of measuring the accuracy of the approximation.

As a makeshift way of scoring the approximation, I have bucketed each list(the approximation and the answer) into 3 partitions (high, medium low) after sorting based on a numeric value that corresponds to the string. I then compare all of the elements in the approximation to see if the string exists in the same partition of the correct list.

I sum the number of correctly classified strings and divide it by the total number of strings. I understand that this is a very crude way of measuring the accuracy of the estimate, and was hoping that better alternatives were available. This is a very small component of a larger piece of work, and was hoping to not have to reinvent the wheel.

EDIT: I think I wasn't clear enough. I don't need the two lists to be exactly equal, I need some sort of measure that shows the lists are similar. For example, The High-Medium-Low (H-M-L) approach we have taken shows that the estimated list is sufficiently similar. The downside of this approach is that if the estimated list has an item at the bottom of the "High" bracket, and in the actual list, the item is at the top of the medium set, the score algorithm fails to deliver.

It could potentially be that in addition to the H-M-L approach, the bottom 20% of each partition is compared to the top 20% of the next partition or something along those lines.

Thanks all for your help!!

like image 364
Anthony Wood Avatar asked May 31 '26 09:05

Anthony Wood


1 Answers

So, we're taking a sequence of items and grouping it into partitions with three categories, high, medium, and low. Let's first make an object to represent those three partitions:

public class Partitions<T>
{
    public IEnumerable<T> High { get; set; }
    public IEnumerable<T> Medium { get; set; }
    public IEnumerable<T> Low { get; set; }
}

Next to make an estimate we want to take two of these objects, one for the actual and one for the estimate. For each priority level we want to see how many of the items are in both collections; this is an "Intersection"; we want to sum up the counts of the intersection of each set.

Then just divide that count by the total:

public static double EstimateAccuracy<T>(Partitions<T> actual
    , Partitions<T> estimate)
{
    int correctlyCategorized = 
        actual.High.Intersect(estimate.High).Count() +
        actual.Medium.Intersect(estimate.Medium).Count() +
        actual.Low.Intersect(estimate.Low).Count();

    double total = actual.High.Count()+
        actual.Medium.Count()+
        actual.Low.Count();

    return correctlyCategorized / total;
}

Of course, if we generalize this to not 3 priorities, but rather a sequence of sequences, in which each sequence corresponds to some bucket (i.e. there are N buckets, not just 3) the code actually gets easier:

public static double EstimateAccuracy<T>(
    IEnumerable<IEnumerable<T>> actual
    , IEnumerable<IEnumerable<T>> estimate)
{
    var query = actual.Zip(estimate, (a, b) => new
    {
        valid = a.Intersect(b).Count(),
        total = a.Count()
    }).ToList();
    return query.Sum(pair => pair.valid) /
        (double)query.Sum(pair => pair.total);
}
like image 114
Servy Avatar answered Jun 02 '26 22:06

Servy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!