I have a collections.deque() of tuples from which I want to draw random samples.
In Python 2.7, I can use batch = random.sample(my_deque, batch_size)
.
But in Python 3.4 this raises TypeError: Population must be a sequence or set. For dicts, use list(d).
What's the best workaround, or recommended way to sample efficiently from a deque in Python 3?
In Python, you can randomly sample elements from a list with choice() , sample() , and choices() of the random module. These functions can also be applied to a string and tuple. choice() returns one random element, and sample() and choices() return a list of multiple random elements.
A deque is a double-ended queue in which elements can be both inserted and deleted from either the left or the right end of the queue. An implementation of a deque in Python is available in the collections module.
For lists, it's always O(1). So, for accessing elements, lists are always a better choice, it's not at all what deques were designed for. Second, because deques are implemented as doubly-ended arrays, they have the advantage when appending or popping from both the right and the left side of a deque (measured as O(1)).
The obvious way – convert to a list.
batch = random.sample(list(my_deque), batch_size))
But you can avoid creating an entire list.
idx_batch = set(sample(range(len(my_deque)), batch_size))
batch = [val for i, val in enumerate(my_deque) if i in idx_batch]
P.S. (Edited)
Actually, random.sample
should work fine with deques in Python >= 3.5. because the class has been updated to match the Sequence interface.
In [3]: deq = collections.deque(range(100))
In [4]: random.sample(deq, 10)
Out[4]: [12, 64, 84, 77, 99, 69, 1, 93, 82, 35]
Note! as Geoffrey Irving has correctly stated in the comment bellow, you'd better convert the queue into a list, because queues are implemented as linked lists, making each index-access O(n) in the size of the queue, therefore sampling m random values will take O(m*n) time.
sample()
on a deque
works fine in Python ≥3.5, and it's pretty fast.
In Python 3.4, you could use this instead, which runs about as fast:
sample_indices = sample(range(len(deq)), 50)
[deq[index] for index in sample_indices]
On my MacBook using Python 3.6.8, this solution is over 44 times faster than Eli Korvigo's solution. :)
I used a deque
with 1 million items, and I sampled 50 items:
from random import sample
from collections import deque
deq = deque(maxlen=1000000)
for i in range(1000000):
deq.append(i)
sample_indices = set(sample(range(len(deq)), 50))
%timeit [deq[i] for i in sample_indices]
1.68 ms ± 23.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit sample(deq, 50)
1.94 ms ± 60.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit sample(range(len(deq)), 50)
44.9 µs ± 549 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit [val for index, val in enumerate(deq) if index in sample_indices]
75.1 ms ± 410 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
That said, as others have pointed out, a deque
is not well suited for random access. If you want to implement a replay memory, you could instead use a rotating list like this:
class ReplayMemory:
def __init__(self, max_size):
self.buffer = [None] * max_size
self.max_size = max_size
self.index = 0
self.size = 0
def append(self, obj):
self.buffer[self.index] = obj
self.size = min(self.size + 1, self.max_size)
self.index = (self.index + 1) % self.max_size
def sample(self, batch_size):
indices = sample(range(self.size), batch_size)
return [self.buffer[index] for index in indices]
With a million items, sampling 50 items is blazingly fast:
%timeit mem.sample(50)
#58 µs ± 691 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With