Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I randomly sample from a HashSet efficiently?

I have a std::collections::HashSet, and I want to sample and remove a uniformly random element.

Currently, what I'm doing is randomly sampling an index using rand.gen_range, then iterating over the HashSet to that index to get the element. Then I remove the selected element. This works, but it's not efficient. Is there an efficient way to do randomly sample an element?

Here's a stripped down version of what my code looks like:

use std::collections::HashSet;

extern crate rand;
use rand::thread_rng;
use rand::Rng;

let mut hash_set = HashSet::new();

// ... Fill up hash_set ...

let index = thread_rng().gen_range(0, hash_set.len());
let element = hash_set.iter().nth(index).unwrap().clone();
hash_set.remove(&element);

// ... Use element ...
like image 622
isaacg Avatar asked Dec 13 '18 04:12

isaacg


People also ask

How do you generate a random value from a HashSet?

In order to get random elements from the HashSet object, we need to generate a random number between 0 (inclusive) and the size of the HashSet (exclusive). And then iterate through the set till we reach the element located at the random number position as given below.


1 Answers

Thinking about Sven Marnach's answer, I want to use a vector, but I also need constant time insertion without duplication. Then I realized that I can maintain both a vector and a set, and ensure that they both had the same elements at all times. This will allow both constant time insertion with deduplication and constant time random removal.

Here's the implementation I ended up with:

struct VecSet<T> {
    set: HashSet<T>,
    vec: Vec<T>,
}

impl<T> VecSet<T>
where
    T: Clone + Eq + std::hash::Hash,
{
    fn new() -> Self {
        Self {
            set: HashSet::new(),
            vec: Vec::new(),
        }
    }
    fn insert(&mut self, elem: T) {
        assert_eq!(self.set.len(), self.vec.len());
        let was_new = self.set.insert(elem.clone());
        if was_new {
            self.vec.push(elem);
        }
    }
    fn remove_random(&mut self) -> T {
        assert_eq!(self.set.len(), self.vec.len());
        let index = thread_rng().gen_range(0, self.vec.len());
        let elem = self.vec.swap_remove(index);
        let was_present = self.set.remove(&elem);
        assert!(was_present);
        elem
    }
    fn is_empty(&self) -> bool {
        assert_eq!(self.set.len(), self.vec.len());
        self.vec.is_empty()
    }
}
like image 92
isaacg Avatar answered Jan 22 '23 15:01

isaacg