Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Faster way to do a List<T>.Contains()

I am trying to do what I think is a "de-intersect" (I'm not sure what the proper name is, but that's what Tim Sweeney of EpicGames called it in the old UnrealEd)

// foo and bar have some identical elements (given a case-insensitive match)
List‹string› foo = GetFoo();
List‹string› bar = GetBar();

// remove non matches
foo = foo.Where(x => bar.Contains(x, StringComparer.InvariantCultureIgnoreCase)).ToList();
bar = bar.Where(x => foo.Contains(x, StringComparer.InvariantCultureIgnoreCase)).ToList();

Then later on, I do another thing where I subtract the result from the original, to see which elements I removed. That's super-fast using .Except(), so no troubles there.

There must be a faster way to do this, because this one is pretty bad-performing with ~30,000 elements (of string) in either List. Preferably, a method to do this step and the one later on in one fell swoop would be nice. I tried using .Exists() instead of .Contains(), but it's slightly slower. I feel a bit thick, but I think it should be possible with some combination of .Except() and .Intersect() and/or .Union().

like image 999
J F Avatar asked Mar 12 '09 23:03

J F


People also ask

How fast is contain?

Contains() ought to be around 50 nanoseconds. So the overall operation takes 50.05 microseconds.

How to check if list Contains particular value in c#?

List<T>. Contains(T) Method is used to check whether an element is in the List<T> or not.

How to check list Contains string in c#?

if (myList. Contains(myString)) string element = myList. ElementAt(myList. IndexOf(myString));


2 Answers

This operation can be called a symmetric difference.

You need a different data structure, like a hash table. Add the intersection of both sets to it, then difference the intersection from each set.

UPDATE:

I got a bit of time to try this in code. I used HashSet<T> with a set of 50,000 strings, from 2 to 10 characters long with the following results:

Original: 79499 ms

Hashset: 33 ms

BTW, there is a method on HashSet called SymmetricExceptWith which I thought would do the work for me, but it actually adds the different elements from both sets to the set the method is called on. Maybe this is what you want, rather than leaving the initial two sets unmodified, and the code would be more elegant.

Here is the code:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

class Program
{
    static void Main(string[] args)
    {
        // foo and bar have some identical elements (given a case-insensitive match)
        var foo = getRandomStrings();
        var bar = getRandomStrings();

        var timer = new Stopwatch();
        
        timer.Start();
        // remove non matches
        var f = foo.Where(x => !bar.Contains(x)).ToList();
        var b = bar.Where(x => !foo.Contains(x)).ToList();
        timer.Stop();

        Debug.WriteLine(String.Format("Original: {0} ms", timer.ElapsedMilliseconds));

        timer.Reset();

        timer.Start();
        var intersect = new HashSet<String>(foo);
        intersect.IntersectWith(bar);

        var fSet = new HashSet<String>(foo);
        var bSet = new HashSet<String>(bar);

        fSet.ExceptWith(intersect);
        bSet.ExceptWith(intersect);
        timer.Stop();

        var fCheck = new HashSet<String>(f);
        var bCheck = new HashSet<String>(b);

        Debug.WriteLine(String.Format("Hashset: {0} ms", timer.ElapsedMilliseconds));

        Console.WriteLine("Sets equal? {0} {1}", fSet.SetEquals(fCheck), bSet.SetEquals(bCheck)); //bSet.SetEquals(set));
        Console.ReadKey();
    }

    static Random _rnd = new Random();

    private const int Count = 50000;

    private static List<string> getRandomStrings() 
    {
        var strings = new List<String>(Count);

        var chars = new Char[10];

        for (var i = 0; i < Count; i++)
        {
            var len = _rnd.Next(2, 10);

            for (var j = 0; j < len; j++)
            {
                var c = (Char)_rnd.Next('a', 'z');
                chars[j] = c;
            }

            strings.Add(new String(chars, 0, len));
        }

        return strings;
    }
}
like image 104
codekaizen Avatar answered Sep 30 '22 18:09

codekaizen


With intersect it would be done like this:

var matches = ((from f in foo 
                select f)
              .Intersect(
                  from b in bar 
                  select b, StringComparer.InvariantCultureIgnoreCase))
like image 26
gcores Avatar answered Sep 30 '22 19:09

gcores