Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hashset vs Array, is further optimization possible?

I have a code that works like the following all throughout my program. One example would be pathfinding, where I create snapshots of a dynamic environment and use the created map as basis for the aStar-algorithm.

As the execution time of the code impacts the maximum amount of units I can have utilizing the pathfinding, I'd like to optimize here as much as possible.

The (I believe) performance optimal solution would be this:

  static int runtest(bool[,] coll, int size, int globalSize)
    {
        int counter = 0;
        for (int x = -size; x < size; x++)
        {
            for (int y = -size; y < size; y++)
            {
                if (coll[x+ globalSize, y+ globalSize])
                {
                    counter++;

                    int tx = x;
                    int ty = y;
                }
            }
        }
        return counter;
    }

However I'm not happy with the amount of wasted memory when using arrays, so I am currently using this solution:

static int runtest(HashSet<int> coll, int size)
    {
        int counter = 0;
        for (int x = -size; x < size; x++)
        {
            for (int y = -size; y < size; y++)
            {
                if (coll.Contains(calculateHash(x, y)))
                {
                    counter++;

                    int tx = x;
                    int ty = y;
                }
            }
        }
        return counter;
    }

 public static int calculateHash(float x1, float y1)
        {
            int x = (int)Math.Floor(x1);
            int y = (int)Math.Floor(y1);

            return calculateHash(x, y);
        }

      //[MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int calculateHash(int x, int y) {
            return x + (y << 16);
        }
    }

The Array implementation is about 5 times faster, is this something I have to live with or can I somehow optimize the hash-version further?

Full Test Code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Diagnostics;
using System.Runtime.CompilerServices;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            int size = 240;
            HashSet<Vector2> VecHash = new HashSet<Vector2>();
            HashSet<int> intHash = new HashSet<int>();
            bool[,] boolArr = new bool[size * 2, size * 2];


        for (int x = -size; x < size; x++) {
            for (int y = -size; y < size; y++) {
                VecHash.Add(new Vector2(x, y));
                intHash.Add(calculateHash(x,y));
                boolArr[x+size, y + size] = true;
            }
        }


        Console.WriteLine("Press enter to start");
        Console.ReadLine();

        int reps = 1000;
        int testSize = 180;
        Stopwatch sw = new Stopwatch();
        while (true) {
            long counter = 0;
            sw.Start();
            for (int i = 0; i < reps; i++)
            {
                counter += runtest(VecHash, testSize);
            }
            sw.Stop();
            Console.WriteLine(sw.ElapsedTicks + "\tVecHash items: \t" + counter);
            counter = 0;
            sw.Restart();
            for (int i = 0; i < reps; i++)
            {
                counter += runtest(intHash, testSize);
            }
            sw.Stop();
            Console.WriteLine(sw.ElapsedTicks + "\tIntHash items: \t" + counter);
            counter = 0;
            sw.Restart();
            for (int i = 0; i < reps; i++)
            {
                counter += runtest(boolArr, testSize, size);
            }
            sw.Stop();
            Console.WriteLine(sw.ElapsedTicks + "\tboolArr items: \t" + counter);
            sw.Reset();
            Console.ReadLine();
        }

    }


    static int runtest(HashSet<Vector2> coll, int size)
    {
        int counter = 0;

        var allrelevant = coll.Where(a => a.x >= -size && a.x < size && a.y >= -size && a.y < size);
        foreach (var item in allrelevant) {
            counter++;

            float tx = item.x;
            float ty = item.y;
        }

        return counter;
    }
    static int runtest(HashSet<int> coll, int size)
    {
        int counter = 0;
        for (int x = -size; x < size; x++)
        {
            for (int y = -size; y < size; y++)
            {
                if (coll.Contains(calculateHash(x, y)))
                {
                    counter++;

                    int tx = x;
                    int ty = y;
                }
            }
        }
        return counter;
    }
    static int runtest(bool[,] coll, int size, int globalSize)
    {
        int counter = 0;
        for (int x = -size; x < size; x++)
        {
            for (int y = -size; y < size; y++)
            {
                if (coll[x+ globalSize, y+ globalSize])
                {
                    counter++;

                    int tx = x;
                    int ty = y;
                }
            }
        }
        return counter;
    }

    public static int calculateHash(float x1, float y1)
    {
        int x = (int)Math.Floor(x1);
        int y = (int)Math.Floor(y1);

        return calculateHash(x, y);
    }

  //[MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static int calculateHash(int x, int y) {
        return x + (y << 16);
    }
}

public struct Vector2
{
    public readonly float x;
    public readonly float y;

    public Vector2(float x, float y)
    {
        this.x = x;
        this.y = y;
    }
}

}

like image 889
user3488765 Avatar asked Nov 17 '25 16:11

user3488765


1 Answers

Classic space/size tradeoff. HashSets, and other hash-style data structures, must use a hashing algorithm to provide the O(1) speed we use them for. In .NET this is done through the .GetHashCode() method that is part of every object.

That is where part of the slowdown comes from - you are executing a hash function, along with a number of other functions with the Contains and what it calls, including InternalGetHashCode and many others. So you throw a number of extra things on the stack, and their state, etc.

Thus, you're really not going to get much faster than this giant 2D array.

like image 117
jamesbascle Avatar answered Nov 19 '25 05:11

jamesbascle



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!