How to get random.sample() from deque in Python 3?

2 Answers

The obvious way – convert to a list.

batch = random.sample(list(my_deque), batch_size))

But you can avoid creating an entire list.

idx_batch = set(sample(range(len(my_deque)), batch_size))
batch = [val for i, val in enumerate(my_deque) if i in idx_batch]

P.S. (Edited)

Actually, random.sample should work fine with deques in Python >= 3.5. because the class has been updated to match the Sequence interface.

In [3]: deq = collections.deque(range(100))

In [4]: random.sample(deq, 10)
Out[4]: [12, 64, 84, 77, 99, 69, 1, 93, 82, 35]

Note! as Geoffrey Irving has correctly stated in the comment bellow, you'd better convert the queue into a list, because queues are implemented as linked lists, making each index-access O(n) in the size of the queue, therefore sampling m random values will take O(m*n) time.

answered Oct 26 '22 21:10

Eli Korvigo

sample() on a deque works fine in Python ≥3.5, and it's pretty fast.

In Python 3.4, you could use this instead, which runs about as fast:

sample_indices = sample(range(len(deq)), 50)
[deq[index] for index in sample_indices]

On my MacBook using Python 3.6.8, this solution is over 44 times faster than Eli Korvigo's solution. :)

I used a deque with 1 million items, and I sampled 50 items:

from random import sample
from collections import deque

deq = deque(maxlen=1000000)
for i in range(1000000):
    deq.append(i)

sample_indices = set(sample(range(len(deq)), 50))

%timeit [deq[i] for i in sample_indices]
1.68 ms ± 23.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit sample(deq, 50)
1.94 ms ± 60.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit sample(range(len(deq)), 50)
44.9 µs ± 549 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit [val for index, val in enumerate(deq) if index in sample_indices]
75.1 ms ± 410 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

That said, as others have pointed out, a deque is not well suited for random access. If you want to implement a replay memory, you could instead use a rotating list like this:

class ReplayMemory:
    def __init__(self, max_size):
        self.buffer = [None] * max_size
        self.max_size = max_size
        self.index = 0
        self.size = 0

    def append(self, obj):
        self.buffer[self.index] = obj
        self.size = min(self.size + 1, self.max_size)
        self.index = (self.index + 1) % self.max_size

    def sample(self, batch_size):
        indices = sample(range(self.size), batch_size)
        return [self.buffer[index] for index in indices]

With a million items, sampling 50 items is blazingly fast:

%timeit mem.sample(50)
#58 µs ± 691 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

answered Oct 26 '22 21:10

MiniQuark

Related questions
                            
                                Seaborn: barplot the counting not mean value of a column
                            
                                How to set browser viewport size
                            
                                Stocking large numbers into numpy array
                            
                                How to handle elements inside Shadow DOM from Selenium
                            
                                Open scrapy output in browser tab or ipython window
                            
                                Speed Static Methods vs Class Method
                            
                                How to return an unpacked list in Python?
                            
                                pandas dataframe resample per day without date time index
                            
                                Pandas Series: Log Normalize
                            
                                reading and doing calculation from .dat file in python
                            
                                pip install libxml2 failed
                            
                                How do I add the --trusted-host in pycharm package install?
                            
                                Visual Studio code and virtualenv
                            
                                Pandas - equivalent of str.contains() in pandas query
                            
                                Python print pdf file with win32print
                            
                                Delete pandas column with no name [duplicate]
                            
                                TypeError: str object is not an iterator
                            
                                extend a pandas datetimeindex by 1 period
                            
                                Django REST Framework Swagger - Authentication Error
                            
                                More arguments in derived class __init__ than base class __init__

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get random.sample() from deque in Python 3?

Tags:

python

random

python-3.x

python-2.7

Greg Farquhar

People also ask

2 Answers

Eli Korvigo

MiniQuark

Recent Activity

Donate For Us