Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# HashSet<T> search performance (compared to an ObservableCollection<T>)?

The C# the generic HashSet<T> search performance should be O(1), and the search performance of an ObservableCollection<T> should be O(n).

I have a large amount of unique elements, each element has a DateTime property that is not unique.

Each element calculates its HashCode by simply returning its DateTime.GetHashCode().

Now I want to get a subset of my data, e.g. all elements that have a date which is between March 2012 and June 2012.

    var result = from p in this.Elements
                 where p.Date >= new DateTime(2012, 03, 01) &&
                       p.Date <= new DateTime(2012, 30, 06
                 select p;

If I run this LINQ query on a collection of 300.000 elements, it takes ~25 ms to return 80 elements that are within the given range - it does not matter if I use a HashSet<T> or an ObservableCollection<T>.

If I loop through all elements manually and check them, it takes the same time, ~25 ms.

But I do know the HashCode of all Dates that are within the given range. Is it possible to get all elements with the given HashCodes from my HashSet<T>? I think that would be much faster...

Is it possible to speed up the LINQ query? I assume that it does not make use of the special abilities of my HashSet<T>?

like image 537
Ehssan Avatar asked May 17 '12 16:05

Ehssan


People also ask

What C is used for?

C programming language is a machine-independent programming language that is mainly used to create many types of applications and operating systems such as Windows, and other complicated programs such as the Oracle database, Git, Python interpreter, and games and is considered a programming foundation in the process of ...

What is C full form?

Full form of C is “COMPILE”. One thing which was missing in C language was further added to C++ that is 'the concept of CLASSES'.

What is C language basics?

What is C? C is a general-purpose programming language created by Dennis Ritchie at the Bell Laboratories in 1972. It is a very popular language, despite being old. C is strongly associated with UNIX, as it was developed to write the UNIX operating system.

Is C language easy?

C is a general-purpose language that most programmers learn before moving on to more complex languages. From Unix and Windows to Tic Tac Toe and Photoshop, several of the most commonly used applications today have been built on C. It is easy to learn because: A simple syntax with only 32 keywords.


2 Answers

You're not using the right data structure. You should be using something like a sorted list (sorted on the Date property) where you can then binary search for the beginning and end of the range.

like image 80
jason Avatar answered Nov 15 '22 08:11

jason


As has been pointed out a hash set is very efficient at determining if a given hash is in the set. Your query just uses the fact that the hashset implement IEnumerable to iterate over the entire set and do the date comparison. It will not use the hashes at all. This is why the manual way takes the same time as the query.

You cannot get an element based on a hash from a hashset, you can only test for existance of the element in the set. A dictionary is what you want if you need to get it by has (which it seems you don't)

Decide what it is that you need to do with your data and use a structure which is optimised for that. This may be your own class which maintains multiple internal structures each of which is efficient at one thing (like one for searching for ranges and another for checking by existence by multiple fields), or there may be an existing structure which fits your needs. But without knowing what it is you want to do with your data its difficult to advise.

The other thing to consider is whether you are optimising prematurely. If 25ms to search manually is fast enough then maybe any structure which implements IEnumerable will be good enough. In which case you can choose one based on the other criteria you need.

like image 39
Sam Holder Avatar answered Nov 15 '22 07:11

Sam Holder