Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating binary sequences without repetition

I am trying to generate sequences containing only 0's and 1's. I have written the following code, and it works.

import numpy as np

batch = 1000
dim = 32

while 1:
    is_same = False
    seq = np.random.randint(0, 2, [batch, dim])
    for i in range(batch):
        for j in range(i + 1, batch):
            if np.array_equal(seq[i], seq[j]):
                is_same = True
    if is_same:
        continue
    else:
        break

My batch variable is in the thousands. This loop above takes about 30 seconds to complete. This is a data generation part of another for loop that runs for about 500 iterations and is therefore extremely slow. Is there a faster way to generate this list of sequences without repetition? Thanks.

The desired result is a collection of batch_size number of sequences each of length dim containing only 0s and 1s such that no two sequences in the collection are the same.

like image 845
learner Avatar asked Jan 12 '21 06:01

learner


People also ask

How many consecutive 0’S are there in a binary string?

Given two integers N and M, the task is to construct a binary string with the following conditions : The Binary String has at most K consecutive 1’s. The Binary String does not contain any adjacent 0’s. If it is not possible to construct such a binary string, then print -1. No consecutive 0’s are present.

How to print all binary strings of size k (given number)?

Given an integer K. Task is Print All binary string of size K (Given number). Recommended: Please try your approach on {IDE} first, before moving on to the solution. Idea behind that is IF string ends with ‘1’ then we put only ‘0’ at the end. IF string ends with ‘0’ then we put both ‘0’ and ‘1’ at the end of string for generating new string.

How to generate 10 random decimal numbers without repetition?

If you want to generate 10 random decimal numbers without repetition, you may use the following formula. Here, 10 is the number of rows, 2 is the number of columns, 1 is the minimum value, 100 is the maximum value, and lastly, FALSE is for generating decimal numbers.

How to generate string that starts with 0 and ends with 1?

IF string ends with ‘0’ then we put both ‘0’ and ‘1’ at the end of string for generating new string. K : size of string First We Generate All string starts with '0' initialize n = 1 . GenerateALLString ( K , Str , n ) a.


Video Answer


4 Answers

Generate batch number of int in range(0, 2**dim + 1) Convert these numbers to binary, then convert to sequence of 0a and 1s.

from random import sample

def generate(batch, dim):
    my_sample = [f'{n:0>32b}' for n in sample(range(2**dim+1), batch)]
    return [[int(n) for n in item] for item in my_sample]

def generate2(batch, dim):
    return [list(map(int, f'{n:0>32b}')) for n in sample(range(2**dim+1), batch)]

the second one is bit faster

from timeit import timeit
print(timeit("generate(1000, 32)", setup="from __main__ import generate", number=100))
print(timeit("generate2(1000, 32)", setup="from __main__ import generate2", number=100))

output

1.4956848690007973
1.1187048860001596
like image 176
buran Avatar answered Oct 22 '22 08:10

buran


For the described desired result you can use binary representations of the numbers 0...batch_size-1 (multiplied by (2^dim)/batch_size) and shuffle them.
That approach is much more efficient, because there is no discarding of tentatively generated numbers and the time complexity without nested loops is much better.

For getting a random component into this (not defined for the desired result, but kind of obvious) you can add a random number to each in the range 0...( (2^dim)/batch_size -1). That will not result in identicals either, because of the spacing of the original sequence generated as described above. The randoms will never reach into the range of the next generated number.

E.g.

dim 5, batch_size 8

sequential binary random total shuffled index
0 00000 10 00010 5
4 00100 00 00100 2
8 01000 11 01011 6
12 01100 11 01111 0
16 10000 01 10001 3
20 10100 00 10100 7
24 11000 10 11010 1
28 11100 00 11100 4

What remains is shuffling, to break the "continuos run" of this.

like image 28
Yunnosch Avatar answered Oct 22 '22 10:10

Yunnosch


An easy way to speed up a lot checking for long sequences is using hashing. For every sequence compute an hash code and then keep a bucket (or a linked list) for all sequences with a given hash.

When you generate a new sequence you only need to check duplicates in the hash bucket of its hash code. For example using 16 bits of hash code the duplication check will be about 65536 times faster.

like image 31
6502 Avatar answered Oct 22 '22 10:10

6502


You can get non-repeating random bit patterns as integers using the sample function from the random module. Converting these integers to bit is a job better done by numpy (as opposed to string manipulations)

def sequenceBatch(batch,dim):
    bits  = np.array(random.sample(range(2**dim),batch),dtype=np.int)
    masks = 2**np.arange(dim)
    return (np.bitwise_and(bits[:,None],masks)>0).astype(np.int)

This is more than 500 times faster than your function (5x faster than buran's generate2() function)

like image 1
Alain T. Avatar answered Oct 22 '22 10:10

Alain T.