Generating binary sequences without repetition

Tags:

I am trying to generate sequences containing only 0's and 1's. I have written the following code, and it works.

import numpy as np

batch = 1000
dim = 32

while 1:
    is_same = False
    seq = np.random.randint(0, 2, [batch, dim])
    for i in range(batch):
        for j in range(i + 1, batch):
            if np.array_equal(seq[i], seq[j]):
                is_same = True
    if is_same:
        continue
    else:
        break

My batch variable is in the thousands. This loop above takes about 30 seconds to complete. This is a data generation part of another for loop that runs for about 500 iterations and is therefore extremely slow. Is there a faster way to generate this list of sequences without repetition? Thanks.

The desired result is a collection of batch_size number of sequences each of length dim containing only 0s and 1s such that no two sequences in the collection are the same.

845

asked Jan 12 '21 06:01

learner

Video Answer

4 Answers

Generate batch number of int in range(0, 2**dim + 1) Convert these numbers to binary, then convert to sequence of 0a and 1s.

from random import sample

def generate(batch, dim):
    my_sample = [f'{n:0>32b}' for n in sample(range(2**dim+1), batch)]
    return [[int(n) for n in item] for item in my_sample]

def generate2(batch, dim):
    return [list(map(int, f'{n:0>32b}')) for n in sample(range(2**dim+1), batch)]

the second one is bit faster

from timeit import timeit
print(timeit("generate(1000, 32)", setup="from __main__ import generate", number=100))
print(timeit("generate2(1000, 32)", setup="from __main__ import generate2", number=100))

output

1.4956848690007973
1.1187048860001596

176

answered Oct 22 '22 08:10

buran

For the described desired result you can use binary representations of the numbers 0...batch_size-1 (multiplied by (2^dim)/batch_size) and shuffle them.
That approach is much more efficient, because there is no discarding of tentatively generated numbers and the time complexity without nested loops is much better.

For getting a random component into this (not defined for the desired result, but kind of obvious) you can add a random number to each in the range 0...( (2^dim)/batch_size -1). That will not result in identicals either, because of the spacing of the original sequence generated as described above. The randoms will never reach into the range of the next generated number.

E.g.

dim 5, batch_size 8

sequential	binary	random	total	shuffled index
0	00000	10	00010	5
4	00100	00	00100	2
8	01000	11	01011	6
12	01100	11	01111	0
16	10000	01	10001	3
20	10100	00	10100	7
24	11000	10	11010	1
28	11100	00	11100	4

What remains is shuffling, to break the "continuos run" of this.

answered Oct 22 '22 10:10

Yunnosch

An easy way to speed up a lot checking for long sequences is using hashing. For every sequence compute an hash code and then keep a bucket (or a linked list) for all sequences with a given hash.

When you generate a new sequence you only need to check duplicates in the hash bucket of its hash code. For example using 16 bits of hash code the duplication check will be about 65536 times faster.

answered Oct 22 '22 10:10

6502

You can get non-repeating random bit patterns as integers using the sample function from the random module. Converting these integers to bit is a job better done by numpy (as opposed to string manipulations)

def sequenceBatch(batch,dim):
    bits  = np.array(random.sample(range(2**dim),batch),dtype=np.int)
    masks = 2**np.arange(dim)
    return (np.bitwise_and(bits[:,None],masks)>0).astype(np.int)

This is more than 500 times faster than your function (5x faster than buran's generate2() function)

answered Oct 22 '22 10:10

Alain T.

Related questions
                            
                                Why doesn't PyGame draw in the window before the delay or sleep?
                            
                                Can't fetch some numbers from a website using requests
                            
                                Tensorflow 2.3.0 does not detect GPU
                            
                                keras accuracy doesn't improve more than 59 percent
                            
                                Plotly: How to create an odd number of subplots?
                            
                                arrays into pandas dataframe columns
                            
                                What is the proper way to specify a custom template path for jupyter nbconvert V6?
                            
                                Extracting blocks from block diagonal PyTorch tensor
                            
                                How can I prevent or trap StopIteration exception in the yield-calling function?
                            
                                How can I remove numbers, and words with length below 2, from a sentence?
                            
                                Set description for query parameter in swagger doc using Pydantic model (FastAPI)
                            
                                Will run_in_executor ever block?
                            
                                typing: How to bind owner class to generic descriptor?
                            
                                `pip install` with all extras
                            
                                package_dir in setup.py not working as expected
                            
                                Can't install Matplotlib on macOS Big Sur
                            
                                Including another file in Dataflow Python flex template, ImportError
                            
                                How to debug a python script launched by a third party app
                            
                                How to define a dataclass so each of its attributes is the list of its subclass attributes?
                            
                                Change How Pandas Displays nan

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Generating binary sequences without repetition

Tags:

performance

python

random

python-3.x

numpy

learner

People also ask

Video Answer

4 Answers

buran

Yunnosch

6502

Alain T.

Recent Activity

Donate For Us