Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# Union vs Contains for lists of continuous data

I'm having some trouble finding answers to a question I have about some code specific to what i'm working on and I cannot seem to find some documentation on how Union works at it's core mechanics in C#. So the problem is this.

I have a set of data that works similar to this example:

     object[] someMainTypeArray = new object [n];
     List<object> objList2 = new List<object>();
     foreach ( object obj in someMainTypeArray ) {
        List<object> objList1 = new List<object>() { "1","2","3" };
        //each obj has a property that will generate a list of data
        //objList1 is the result of the data specific to obj
        //some of this data could be duplicates
        //Which is better, this:
        foreach ( object test in objList1 ) {
           if ( !objList2.Contains( test ) ) {
              objList2.Add( test );
           }
        }
        //or this:
        objList2 = objList2.Union( objList1 ).ToList();
        //Also, assume this has to happen anywhere from 0 to 60 times per second
     }

Is it more efficient to let Union do all the work? Or is it better to compare each element using Contains?

If No for both, what is the best way to populate unique lists using the least amount of processing time possible?

Efficiency is key for this. Also, this is not homework, or anything work related, just learning related.

The lists are continuous at runtime in the way that they are eventually wiped clean and repopulated. The changes in the lists are used to make decisions based on whether or not the final result lists which are all similar to this example, are used to come up with a final list and if that list is empty, its a fail condition, and if that list is not empty, its a success condition.

Here's a snippet of the code in question for one of the lists created:

     Player.ClearMoves();
     List<Pair<BoardLocation, BoardLocation>> attacking = new List<Pair<BoardLocation, BoardLocation>>();
     foreach ( ChessPiece p in Board[this.Player.Opponent] ) {
        if ( p.TheoryMove( this.Location ) ) {
           foreach ( Pair<BoardLocation , BoardLocation> l in Utility.GetLocations( p.Location , this.Location ) ) {
              if ( !attacking.Contains( l ) ) {
                 attacking.Add( l );
              }
           }
        }
     }
     if ( attacking.Count < 1 ) {
        return false;
     }
like image 427
G1xb17 Avatar asked Nov 24 '15 22:11

G1xb17


1 Answers

You can find the Enumerable.Union implementation in the reference source.

This is how it works:

public static IEnumerable<TSource> Union<TSource>(this IEnumerable<TSource> first, IEnumerable<TSource> second) {
    if (first == null) throw Error.ArgumentNull("first");
    if (second == null) throw Error.ArgumentNull("second");
    return UnionIterator<TSource>(first, second, null);
}

static IEnumerable<TSource> UnionIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
    Set<TSource> set = new Set<TSource>(comparer);
    foreach (TSource element in first)
        if (set.Add(element)) yield return element;
    foreach (TSource element in second)
        if (set.Add(element)) yield return element;
}

As you can see, Union will iterate through both enumerables and yield objects from those sources. Like all Linq methods, it will not create a list but work as a generator function. The list will only be created when you call .ToList().

In order to avoid duplicates, it will use a Set and try to add an element before yielding it. If the addition to the set is successful, then the element was not already in there, so it can be yielded.

Note that sets are very efficient for looking up whether an element exists in it. They provide a item lookup in amortized constant time. So this is definitely more efficient than your objList2.Contains which will need to iterate through the list over and over to figure whether each element exists in it.

Also note that Union is built to maintain the order of the input enumerables. If you do not need that, then you can skip this completely and just use a Set in the first place. This is especially good if you plan on adding new items to the same target set all the time since it reuses the structure:

HashSet<object> set = new HashSet<object>();

foreach (…)
{
    List<object> objList1 = …

    // expand the set with the items from `objList1`
    set.UnionWith(objList1);
}

It would be even better if you avoided creating objList1 in the first place and just added your items to the set directly—if that is possible for your use case.

like image 177
poke Avatar answered Oct 17 '22 04:10

poke