Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I pop from a HashSet efficiently?

Tags:

hashset

rust

My algorithm needs to iteratively shrink a set by removing an element, and do something with the element removed and with the shrinking set in each iteration. And:

  • I need a genuine set with fast lookup, not just a vector containing unique elements.
  • The choice of element is arbitrary: the outcome of the algorithm doesn't depend on the order visited. The performance probably varies greatly with that choice, but let's say I want the simplest code and leave it up to the set itself to pick the element it can remove efficiently.
  • By the way, the algorithm is the basic form of the Bron–Kerbosch algorithm. Smarter versions of that algorithm work faster (mostly) because they don't leave the choice of element arbitrary and I'd like to learn how much that effort pays off.

Python sets have a pop member that pretty much does that. In Scala and Go, picking and removing the "first" element of a hash set seems to work fine (where "first" corresponds to the iterator). In Rust, that is something like:

// split off an arbitrary element from a (non-empty) set
pub fn pop<T>(set: &mut HashSet<T>) -> T
where
    T: Eq + Clone + std::hash::Hash,
{
    let elt = set.iter().next().cloned().unwrap();
    set.remove(&elt);
    elt
}

This seems to be a performance bottleneck compared to the other languages. I benchmarked some implementations of a pop-like function on the playground but none perform well. Apparently removing an element is not expensive, but picking one is: iter().next() costs a fortune (*). Avoiding that with retain understandably doesn't help: it always iterates the whole set. Are there any alternatives?

(*) On closer examination, iter().next() is quite cheap, as far as microbenchmarking can be trusted. Separate microbenchmarks say that picking an arbitrary element from a set costs (in nanoseconds on my system):

| Type of set      | Number of elements in set instance
|                  | 100 | 10,000 | 1,000,000
| Rust HashSet     |   2 |      2 |         2
| Rust BTreeSet    |  11 |     12 |        13
| Go map[]struct{} |  27 |     31 |        94
| Python set       | 125 |    125 |       125
like image 673
Stein Avatar asked Feb 15 '19 15:02

Stein


People also ask

Can you remove from a set while iterating?

You may not use the remove() method of a Collection while iterating it because otherwise you will get ConcurrentModificationException.

When to use a HashSet?

A HashSet is usually used for high-performance operations involving a set of unique data. Since HashSet contains only unique elements, its internal structure is optimized for faster searches. Note that you can store a single null value in a HashSet.

Whats a HashSet?

HashSet is a class that extends AbstractSet and implements the Set interface in Java. It is a very useful tool that allows you to store unique items and access them in constant time (on average). No duplicate values are stored.


1 Answers

the set I'm using has integers

Don't use a HashSet; A BTreeSet has better and more consistent performance.

For N = 100000...

BTreeSet

sequenced : 3065.098µs
pop_1     : 2941.876µs
pop_2     : 2927.429µs

HashSet

sequenced : 3091.454µs
pop_1     : 172547.080µs
pop_2     : 807182.085µs
like image 85
Shepmaster Avatar answered Oct 05 '22 06:10

Shepmaster