I am trying to generate sequences containing only 0
's and 1
's. I have written the following code, and it works.
import numpy as np
batch = 1000
dim = 32
while 1:
is_same = False
seq = np.random.randint(0, 2, [batch, dim])
for i in range(batch):
for j in range(i + 1, batch):
if np.array_equal(seq[i], seq[j]):
is_same = True
if is_same:
continue
else:
break
My batch
variable is in the thousands. This loop above takes about 30 seconds to complete. This is a data generation part of another for
loop that runs for about 500 iterations and is therefore extremely slow. Is there a faster way to generate this list of sequences without repetition? Thanks.
The desired result is a collection of batch_size
number of sequences each of length dim
containing only 0
s and 1
s such that no two sequences in the collection are the same.
Given two integers N and M, the task is to construct a binary string with the following conditions : The Binary String has at most K consecutive 1’s. The Binary String does not contain any adjacent 0’s. If it is not possible to construct such a binary string, then print -1. No consecutive 0’s are present.
Given an integer K. Task is Print All binary string of size K (Given number). Recommended: Please try your approach on {IDE} first, before moving on to the solution. Idea behind that is IF string ends with ‘1’ then we put only ‘0’ at the end. IF string ends with ‘0’ then we put both ‘0’ and ‘1’ at the end of string for generating new string.
If you want to generate 10 random decimal numbers without repetition, you may use the following formula. Here, 10 is the number of rows, 2 is the number of columns, 1 is the minimum value, 100 is the maximum value, and lastly, FALSE is for generating decimal numbers.
IF string ends with ‘0’ then we put both ‘0’ and ‘1’ at the end of string for generating new string. K : size of string First We Generate All string starts with '0' initialize n = 1 . GenerateALLString ( K , Str , n ) a.
Generate batch
number of int
in range(0, 2**dim + 1)
Convert these numbers to binary, then convert to sequence of 0
a and 1
s.
from random import sample
def generate(batch, dim):
my_sample = [f'{n:0>32b}' for n in sample(range(2**dim+1), batch)]
return [[int(n) for n in item] for item in my_sample]
def generate2(batch, dim):
return [list(map(int, f'{n:0>32b}')) for n in sample(range(2**dim+1), batch)]
the second one is bit faster
from timeit import timeit
print(timeit("generate(1000, 32)", setup="from __main__ import generate", number=100))
print(timeit("generate2(1000, 32)", setup="from __main__ import generate2", number=100))
output
1.4956848690007973
1.1187048860001596
For the described desired result you can use binary representations of the numbers 0...batch_size-1
(multiplied by (2^dim)/batch_size) and shuffle them.
That approach is much more efficient, because there is no discarding of tentatively generated numbers and the time complexity without nested loops is much better.
For getting a random component into this (not defined for the desired result, but kind of obvious) you can add a random number to each in the range 0...( (2^dim)/batch_size -1). That will not result in identicals either, because of the spacing of the original sequence generated as described above. The randoms will never reach into the range of the next generated number.
E.g.
dim 5, batch_size 8
sequential | binary | random | total | shuffled index |
---|---|---|---|---|
0 | 00000 | 10 | 00010 | 5 |
4 | 00100 | 00 | 00100 | 2 |
8 | 01000 | 11 | 01011 | 6 |
12 | 01100 | 11 | 01111 | 0 |
16 | 10000 | 01 | 10001 | 3 |
20 | 10100 | 00 | 10100 | 7 |
24 | 11000 | 10 | 11010 | 1 |
28 | 11100 | 00 | 11100 | 4 |
What remains is shuffling, to break the "continuos run" of this.
An easy way to speed up a lot checking for long sequences is using hashing. For every sequence compute an hash code and then keep a bucket (or a linked list) for all sequences with a given hash.
When you generate a new sequence you only need to check duplicates in the hash bucket of its hash code. For example using 16 bits of hash code the duplication check will be about 65536 times faster.
You can get non-repeating random bit patterns as integers using the sample function from the random module. Converting these integers to bit is a job better done by numpy (as opposed to string manipulations)
def sequenceBatch(batch,dim):
bits = np.array(random.sample(range(2**dim),batch),dtype=np.int)
masks = 2**np.arange(dim)
return (np.bitwise_and(bits[:,None],masks)>0).astype(np.int)
This is more than 500 times faster than your function (5x faster than buran's generate2() function)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With