Hashset vs Array, is further optimization possible?

Question

I have a code that works like the following all throughout my program. One example would be pathfinding, where I create snapshots of a dynamic environment and use the created map as basis for the aStar-algorithm.

As the execution time of the code impacts the maximum amount of units I can have utilizing the pathfinding, I'd like to optimize here as much as possible.

The (I believe) performance optimal solution would be this:

  static int runtest(bool[,] coll, int size, int globalSize)
    {
        int counter = 0;
        for (int x = -size; x < size; x++)
        {
            for (int y = -size; y < size; y++)
            {
                if (coll[x+ globalSize, y+ globalSize])
                {
                    counter++;

                    int tx = x;
                    int ty = y;
                }
            }
        }
        return counter;
    }

However I'm not happy with the amount of wasted memory when using arrays, so I am currently using this solution:

static int runtest(HashSet<int> coll, int size)
    {
        int counter = 0;
        for (int x = -size; x < size; x++)
        {
            for (int y = -size; y < size; y++)
            {
                if (coll.Contains(calculateHash(x, y)))
                {
                    counter++;

                    int tx = x;
                    int ty = y;
                }
            }
        }
        return counter;
    }

 public static int calculateHash(float x1, float y1)
        {
            int x = (int)Math.Floor(x1);
            int y = (int)Math.Floor(y1);

            return calculateHash(x, y);
        }

      //[MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int calculateHash(int x, int y) {
            return x + (y << 16);
        }
    }

The Array implementation is about 5 times faster, is this something I have to live with or can I somehow optimize the hash-version further?

Full Test Code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Diagnostics;
using System.Runtime.CompilerServices;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            int size = 240;
            HashSet<Vector2> VecHash = new HashSet<Vector2>();
            HashSet<int> intHash = new HashSet<int>();
            bool[,] boolArr = new bool[size * 2, size * 2];


        for (int x = -size; x < size; x++) {
            for (int y = -size; y < size; y++) {
                VecHash.Add(new Vector2(x, y));
                intHash.Add(calculateHash(x,y));
                boolArr[x+size, y + size] = true;
            }
        }


        Console.WriteLine("Press enter to start");
        Console.ReadLine();

        int reps = 1000;
        int testSize = 180;
        Stopwatch sw = new Stopwatch();
        while (true) {
            long counter = 0;
            sw.Start();
            for (int i = 0; i < reps; i++)
            {
                counter += runtest(VecHash, testSize);
            }
            sw.Stop();
            Console.WriteLine(sw.ElapsedTicks + "	VecHash items: 	" + counter);
            counter = 0;
            sw.Restart();
            for (int i = 0; i < reps; i++)
            {
                counter += runtest(intHash, testSize);
            }
            sw.Stop();
            Console.WriteLine(sw.ElapsedTicks + "	IntHash items: 	" + counter);
            counter = 0;
            sw.Restart();
            for (int i = 0; i < reps; i++)
            {
                counter += runtest(boolArr, testSize, size);
            }
            sw.Stop();
            Console.WriteLine(sw.ElapsedTicks + "	boolArr items: 	" + counter);
            sw.Reset();
            Console.ReadLine();
        }

    }


    static int runtest(HashSet<Vector2> coll, int size)
    {
        int counter = 0;

        var allrelevant = coll.Where(a => a.x >= -size && a.x < size && a.y >= -size && a.y < size);
        foreach (var item in allrelevant) {
            counter++;

            float tx = item.x;
            float ty = item.y;
        }

        return counter;
    }
    static int runtest(HashSet<int> coll, int size)
    {
        int counter = 0;
        for (int x = -size; x < size; x++)
        {
            for (int y = -size; y < size; y++)
            {
                if (coll.Contains(calculateHash(x, y)))
                {
                    counter++;

                    int tx = x;
                    int ty = y;
                }
            }
        }
        return counter;
    }
    static int runtest(bool[,] coll, int size, int globalSize)
    {
        int counter = 0;
        for (int x = -size; x < size; x++)
        {
            for (int y = -size; y < size; y++)
            {
                if (coll[x+ globalSize, y+ globalSize])
                {
                    counter++;

                    int tx = x;
                    int ty = y;
                }
            }
        }
        return counter;
    }

    public static int calculateHash(float x1, float y1)
    {
        int x = (int)Math.Floor(x1);
        int y = (int)Math.Floor(y1);

        return calculateHash(x, y);
    }

  //[MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static int calculateHash(int x, int y) {
        return x + (y << 16);
    }
}

public struct Vector2
{
    public readonly float x;
    public readonly float y;

    public Vector2(float x, float y)
    {
        this.x = x;
        this.y = y;
    }
}

}

jamesbascle · Accepted Answer

Classic space/size tradeoff. HashSets, and other hash-style data structures, must use a hashing algorithm to provide the O(1) speed we use them for. In .NET this is done through the .GetHashCode() method that is part of every object.

That is where part of the slowdown comes from - you are executing a hash function, along with a number of other functions with the Contains and what it calls, including InternalGetHashCode and many others. So you throw a number of extra things on the stack, and their state, etc.

Thus, you're really not going to get much faster than this giant 2D array.

Hashset vs Array, is further optimization possible?

Tags:

c#

optimization

user3488765

1 Answers

jamesbascle

Recent Activity

Donate For Us

Hashset vs Array, is further optimization possible?

Tags:

c#

optimization

user3488765

1 Answers

jamesbascle

Related questions

Recent Activity

Donate For Us