Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

For a list that must only have unique items in it, but will contain 0 or 1 items 99% of the time, does the overhead make a List better than a HashSet?

I am currently working on a project where we have a set of events. One piece of analysis we do on the events is to look through a specific type of event and check to see if it's likely that it was prompted by another event which happened shortly before (or slightly after in one odd case). Each of these events can only be effected by a single event, but one event could be the causal event for multiple events. We want this association to go both ways so that, from any particular method, you can go straight to the event which caused it, or one of the events which it caused. Based on that, I started by adding the following properties to the Event objects and adding a funct:

protected Event causalEvent;
protected List<Event> effectedEvents;

After a bit of thinking, I considered that we never want the same item added twice to the effectedEvents list. After reading the answer to Preventing Duplicate List<T> Entries, I went with a Hashset.

protected Event causalEvent;
protected HashSet<Event> effectedEvents;

A co-worker and I got to discussing the code I'd added and he pointed out that using a HashSet might confuse people since he tended to see a HashSet and assume that there's a great deal of data. In our case, due to the rules being used in the algorithms, effectedEvents is going to have 0 items in about 90% of the cases, 1 item in 9%, and 2 maybe 1% of the time. Almost never will we have more than 2 items, although it is possible. I believe the lookup cost is the same for both collections. The amount of memory used is very similar since both start assuming a small capacity (although, I will concede that List gives you the ability to set that capacity in the constructor while HashSet only allows one to trim the value down based on its contents, "rounded to an implementation-specific value").

So, long question short, is there any real penalty to using a HashSet other than possible confusion for those unfamiliar with using it to ensure uniqueness?

like image 909
Sean Duggan Avatar asked Jan 31 '26 14:01

Sean Duggan


2 Answers

The analysis performed in this answer indicates that you only see a performance advantage with HashSet over List when you get to 5 strings, or 20 objects (of course, results will vary based on what you are doing). Since you are going to have 0-2 items in almost all cases, your best bet performance-wise is probably to use the List.

I would not worry about the confusion of those unfamiliar with using a HashSet to ensure uniqueness. That is one of the primary uses of a HashSet. Pick the best tool for the job, and if you think people will be confused, a short comment can help with that.

Also, though it is good to use the best performing coding strategy, you should also beware of spending too much time on micro-optimizations that can be premature. Unless you are using a lot of these objects, you probably will never notice the difference between List and HashSet in this case.

like image 86
Yaakov Ellis Avatar answered Feb 02 '26 03:02

Yaakov Ellis


If you are after memory and performance you could use a plain object and put the event directly into the field. If you need more than one entry you can replace it on demand with a List or HashSet. Below is some code to show the concept. This gives you maximum speed with a much reduced memory footprint if most of the time the List/HashSet is empty.

This is in my opinion the most elegant solution for such sparse data structures.

using System;
using System.Collections.Generic;
using System.Diagnostics;

namespace DynamicSet
{
    class Program
    {
        object DynamicSet; // can be null, one stored Event or a List/HashSet<Event> 
                           // depending on how many elements are needed.

        bool Contains(Event ev)
        {
            if( DynamicSet == null )
            {
                return false;
            }

            var storedEvent = DynamicSet as Event;
            if (storedEvent != null)
            {
                return Object.ReferenceEquals(ev, storedEvent);
            }
            var set = (HashSet<Event>)DynamicSet;
            return set.Contains(ev);
        }

        void AddEvent(Event ev)
        {
            if( DynamicSet == null )
            {
                DynamicSet = ev;
                return;
            }

            var hash = DynamicSet as HashSet<Event>;
            if( hash != null )
            {
                hash.Add(ev);
            }
            else
            {
                hash = new HashSet<Event>();
                hash.Add((Event)DynamicSet);
                DynamicSet = hash;
            }
        }

        static void Main(string[] args)
        {
            Program p = new Program();
            Event ev1 = new Event();
            Event ev2 = new Event();

            p.AddEvent(ev1);
            Debug.Assert(p.Contains(ev1));
            Debug.Assert(!p.Contains(ev2));
            p.AddEvent(ev1);
            Debug.Assert(p.Contains(ev1));
            Debug.Assert(!p.Contains(ev2));

            p.AddEvent(ev2);
            Debug.Assert(p.Contains(ev1));
            Debug.Assert(p.Contains(ev2));

        }
    }
}
like image 41
Alois Kraus Avatar answered Feb 02 '26 04:02

Alois Kraus



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!