Set with getRandom

Question

This is an interview question: Impement a Set class with get, put, and getRandom.

I would consider the following options:

sorted/unsorted linked list: get - O(N), put - O(N), getRandom - O(N)
unsorted vector (resizable array): get - O(N), put - ?, getRandom - O(1)
sorted vector (resizable array): get - O(logN), put - ?, getRandom - O(1)
hash table: get - O(1), put - O(1), getRandom - O(table size)
balanced binary search tree: get - O(logN), put - O(logN), getRandom - O(N)

It looks like the best candidates are:

hash table if get/put are much more frequent than getRandom
sorted vector (resizable array) if getRandom is much more frequent than get/put

Now I wonder if we can combine a vector and hash table somehow to make up a better set.

btilly · Accepted Answer

You can make get, put and getRandom all be O(1) average time.

What you do is store 2 data structures. One is a hash. The other lists the elements in random order in a growable array.

When you put, you put it in the hash, add the element to the end of the array, then swap the end of the array for a random array element.

When you get, you look in the hash for the element.

When you getRandom, you take the last element of the array, and then swap that last element with another spot in the array.

If you want you can add delete as just removing the hash. Now getRandom is implemented by taking the element, checking whether it is in the hash, and if it is not then shrinking the array, and repeating. At this point getRandom is occasionally O(n) BUT the amortized average costs of all operations can be proven to be O(1).

ffao · Answer

If you keep a separate structure that tells you how many items there are in each bucket of a hash table, you can use binary search to find the n-th element, which would give you O(log n) for all three operations.

A balanced binary search tree augmented with a "count" for each node (that tells how many nodes has the subtree rooted at this node) would work as well for those bounds.

Some corrections to the above: you can't do random access in a linked list, so all operations are O(N). Also, put is O(n) in both vectors due to having to displace elements in the sorted version and checking for duplicates in the unsorted version.

Set with getRandom

Tags:

language-agnostic

algorithm

data-structures

set

Michael

2 Answers

btilly

ffao

Recent Activity

Donate For Us

Set with getRandom

Tags:

language-agnostic

algorithm

data-structures

set

Michael

2 Answers

btilly

ffao

Related questions

Recent Activity

Donate For Us