My algorithm needs to iteratively shrink a set by removing an element, and do something with the element removed and with the shrinking set in each iteration. And:
Python sets have a pop
member that pretty much does that. In Scala and Go, picking and removing the "first" element of a hash set seems to work fine (where "first" corresponds to the iterator). In Rust, that is something like:
// split off an arbitrary element from a (non-empty) set
pub fn pop<T>(set: &mut HashSet<T>) -> T
where
T: Eq + Clone + std::hash::Hash,
{
let elt = set.iter().next().cloned().unwrap();
set.remove(&elt);
elt
}
This seems to be a performance bottleneck compared to the other languages. I benchmarked some implementations of a pop-like function on the playground but none perform well. Apparently removing an element is not expensive, but picking one is: iter().next()
costs a fortune (*). Avoiding that with retain
understandably doesn't help: it always iterates the whole set. Are there any alternatives?
(*) On closer examination, iter().next()
is quite cheap, as far as microbenchmarking can be trusted. Separate microbenchmarks say that picking an arbitrary element from a set costs (in nanoseconds on my system):
| Type of set | Number of elements in set instance
| | 100 | 10,000 | 1,000,000
| Rust HashSet | 2 | 2 | 2
| Rust BTreeSet | 11 | 12 | 13
| Go map[]struct{} | 27 | 31 | 94
| Python set | 125 | 125 | 125
You may not use the remove() method of a Collection while iterating it because otherwise you will get ConcurrentModificationException.
A HashSet is usually used for high-performance operations involving a set of unique data. Since HashSet contains only unique elements, its internal structure is optimized for faster searches. Note that you can store a single null value in a HashSet.
HashSet is a class that extends AbstractSet and implements the Set interface in Java. It is a very useful tool that allows you to store unique items and access them in constant time (on average). No duplicate values are stored.
the set I'm using has integers
Don't use a HashSet
; A BTreeSet
has better and more consistent performance.
For N
= 100000...
BTreeSet
sequenced : 3065.098µs
pop_1 : 2941.876µs
pop_2 : 2927.429µs
HashSet
sequenced : 3091.454µs
pop_1 : 172547.080µs
pop_2 : 807182.085µs
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With