HashSet performance Add vs Contains for existing elements

Tags:

For some reason, it seems the Add operation on a HashSet is slower than the Contains operation when the element already exists in the HashSet.

Here is proof:

    Stopwatch watch = new Stopwatch();
    int size = 10000;
    int iterations = 10000;


    var s = new HashSet<int>();
    for (int i = 0; i < size; i++) {
        s.Add(i);
    }

    Console.WriteLine(watch.Time(() =>
    {
        for (int i = 0; i < size; i++) {
            s.Add(i);
        }
    }, iterations));

    s = new HashSet<int>();
    for (int i = 0; i < size; i++) {
        s.Add(i);
    }

    // outputs: 47,074,764

    Console.WriteLine(watch.Time(() =>
    {
        for (int i = 0; i < size; i++) {
            if (!s.Contains(i))
                s.Add(i);
        }
    }, iterations));

    // outputs: 41,125,219

Why is Contains faster than Add for already-existing elements?

Note: I'm using this Stopwatch extension from another SO question.

    public static long Time(this Stopwatch sw, Action action, int iterations) {
        sw.Reset();
        sw.Start();
        for (int i = 0; i < iterations; i++) {
            action();
        }
        sw.Stop();

        return sw.ElapsedTicks;
    }

UPDATE: Internal testing has revealed that the big performance diff only happens on the x64 version of the .NET framework. With the 32 bit version of the framework Contains seems run at identical speed to add (in fact it appears that the version with the contains runs a percent slower in some test runs) On X64 versions of the framework, the version with the contains seems to run about 15% faster.

645

asked Mar 09 '09 21:03

Sam Saffron

1 Answers

AddIfNotPresent does an additional divide that Contains doesn't perform. Take a look at the IL for Contains:

IL_000a:  call       instance int32 class System.Collections.Generic.HashSet`1<!T>::InternalGetHashCode(!0)
  IL_000f:  stloc.0
  IL_0010:  ldarg.0
  IL_0011:  ldfld      int32[] class System.Collections.Generic.HashSet`1<!T>::m_buckets
  IL_0016:  ldloc.0
  IL_0017:  ldarg.0
  IL_0018:  ldfld      int32[] class System.Collections.Generic.HashSet`1<!T>::m_buckets
  IL_001d:  ldlen
  IL_001e:  conv.i4
  IL_001f:  rem
  IL_0020:  ldelem.i4
  IL_0021:  ldc.i4.1
  IL_0022:  sub
  IL_0023:  stloc.1

This is computing the bucket location for the hash code. The result is saved at local memory location 1.

AddIfNotPresent does something similar, but it also saves the computed value at location 2, so that it can insert the item into the hash table at that position if the item doesn't exist. It does that save because one of the locations is modified later in the loop that goes looking for the item. Anyway, here's the relevant code for AddIfNotPresent:

IL_0011:  call       instance int32 class System.Collections.Generic.HashSet`1<!T>::InternalGetHashCode(!0)
  IL_0016:  stloc.0
  IL_0017:  ldloc.0
  IL_0018:  ldarg.0
  IL_0019:  ldfld      int32[] class System.Collections.Generic.HashSet`1<!T>::m_buckets
  IL_001e:  ldlen
  IL_001f:  conv.i4
  IL_0020:  rem
  IL_0021:  stloc.1
  IL_0022:  ldarg.0
  IL_0023:  ldfld      int32[] class System.Collections.Generic.HashSet`1<!T>::m_buckets
  IL_0028:  ldloc.0
  IL_0029:  ldarg.0
  IL_002a:  ldfld      int32[] class System.Collections.Generic.HashSet`1<!T>::m_buckets
  IL_002f:  ldlen
  IL_0030:  conv.i4
  IL_0031:  rem
  IL_0032:  ldelem.i4
  IL_0033:  ldc.i4.1
  IL_0034:  sub
  IL_0035:  stloc.2

Anyway, I think the extra divide is what's causing Add to take more time than Contains. At first glance, it looks like that extra divide could be factored out, but I can't say for sure without spending a little more time deciphering the IL.

answered Oct 08 '22 11:10

Jim Mischel

Related questions
                            
                                What do the different build actions do in a csproj. I.e. AdditionalFiles or Fakes
                            
                                .Net DownloadFileTaskAsync robust WPF code
                            
                                Implicit conversion fails when changing struct to sealed class
                            
                                How to adapt IObjectContextAdapter from EF 6 to EF Core
                            
                                How to exclude folders when using TFS in vscode?
                            
                                VS 2017 immediate window shows "Internal error in the C# compiler"
                            
                                Trouble signing a JWT token with an x509 Certificate
                            
                                Why does this interface have to be explicitly implemented?
                            
                                New Azure WebJob Project - JobHostConfiguration/RunAndBlock missing after NuGet updates
                            
                                How to use ASP.NET Core resource-based authorization without duplicating if/else code everywhere
                            
                                Restrictions on arguments to PathRelativePathTo in a "long path aware" environment
                            
                                Is there a definitive naming convention for methods returning IAsyncEnumerable?
                            
                                Why is this System.IO.Pipelines code much slower than Stream-based code?
                            
                                Using ASP.NET MVC, how to best avoid writing both the Add View and Edit View?
                            
                                C#.Net: Why is my Process.Start() hanging?
                            
                                What functional differences exist between WPF and WinForms WebBrowser control?
                            
                                Silverlight Rest Service, Security Exception
                            
                                How do I create an xmlElement from the current node of a xmlReader?
                            
                                How do you do a SQL style 'IN' statement in LINQ to Entities (Entity Framework) if Contains isn't supported?
                            
                                How to exclude nonserializable observers from a [Serializable] INotifyPropertyChanged implementor?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

HashSet performance Add vs Contains for existing elements

Tags:

performance

c#

hashset

Sam Saffron

People also ask

1 Answers

Jim Mischel

Recent Activity

Donate For Us