I need to generate 1D array where repeated sequences of integers are separated by a random number of zeros.
So far I am using next code for this:
from random import normalvariate
regular_sequence = np.array([1,2,3,4,5], dtype=np.int)
n_iter = 10
lag_mean = 10 # mean length of zeros sequence
lag_sd = 1 # standard deviation of zeros sequence length
# Sequence of lags lengths
lag_seq = [int(round(normalvariate(lag_mean, lag_sd))) for x in range(n_iter)]
# Generate list of concatenated zeros and regular sequences
seq = [np.concatenate((np.zeros(x, dtype=np.int), regular_sequence)) for x in lag_seq]
seq = np.concatenate(seq)
It works but looks very slow when I need a lot of long sequences. So, how can I optimize it?
To create a numpy array of specific shape with random values, use numpy. random. rand() with the shape of the array passed as argument. In this tutorial, we will learn how to create a numpy array with random values using examples.
You can use numpy. random. shuffle() . This function only shuffles the array along the first axis of a multi-dimensional array.
The numpy.random.rand() function creates an array of specified shape and fills it with random values. Syntax : numpy.random.rand(d0, d1, ..., dn) Parameters : d0, d1, ..., dn : [int, optional]Dimension of the returned array we require, If no argument is given a single Python float is returned.
You can pre-compute indices where repeated regular_sequence
elements are to be put and then set those with regular_sequence
in a vectorized manner. For pre-computing those indices, one can use np.cumsum
to get the start of each such chunk of regular_sequence
and then add a continuous set of integers extending to the size of regular_sequence
to get all indices that are to be updated. Thus, the implementation would look something like this -
# Size of regular_sequence
N = regular_sequence.size
# Use cumsum to pre-compute start of every occurance of regular_sequence
offset_arr = np.cumsum(lag_seq)
idx = np.arange(offset_arr.size)*N + offset_arr
# Setup output array
out = np.zeros(idx.max() + N,dtype=regular_sequence.dtype)
# Broadcast the start indices to include entire length of regular_sequence
# to get all positions where regular_sequence elements are to be set
np.put(out,idx[:,None] + np.arange(N),regular_sequence)
Runtime tests -
def original_app(lag_seq, regular_sequence):
seq = [np.concatenate((np.zeros(x, dtype=np.int), regular_sequence)) for x in lag_seq]
return np.concatenate(seq)
def vectorized_app(lag_seq, regular_sequence):
N = regular_sequence.size
offset_arr = np.cumsum(lag_seq)
idx = np.arange(offset_arr.size)*N + offset_arr
out = np.zeros(idx.max() + N,dtype=regular_sequence.dtype)
np.put(out,idx[:,None] + np.arange(N),regular_sequence)
return out
In [64]: # Setup inputs
...: regular_sequence = np.array([1,2,3,4,5], dtype=np.int)
...: n_iter = 1000
...: lag_mean = 10 # mean length of zeros sequence
...: lag_sd = 1 # standard deviation of zeros sequence length
...:
...: # Sequence of lags lengths
...: lag_seq = [int(round(normalvariate(lag_mean, lag_sd))) for x in range(n_iter)]
...:
In [65]: out1 = original_app(lag_seq, regular_sequence)
In [66]: out2 = vectorized_app(lag_seq, regular_sequence)
In [67]: %timeit original_app(lag_seq, regular_sequence)
100 loops, best of 3: 4.28 ms per loop
In [68]: %timeit vectorized_app(lag_seq, regular_sequence)
1000 loops, best of 3: 294 µs per loop
The best approach, I think, would be to use convolution. You can figure out the lag lengths, combine that with the length of the sequence, and use that to figure out the starting point of each regular sequence. Set those starting points to zero, then convolve with your regular sequence to fill in the values.
import numpy as np
regular_sequence = np.array([1,2,3,4,5], dtype=np.int)
n_iter = 10000000
lag_mean = 10 # mean length of zeros sequence
lag_sd = 1 # standard deviation of zeros sequence length
# Sequence of lags lengths
lag_lens = np.round(np.random.normal(lag_mean, lag_sd, n_iter)).astype(np.int)
lag_lens[1:] += len(regular_sequence)
starts_inds = lag_lens.cumsum()-1
# Generate list of convolved ones and regular sequences
seq = np.zeros(lag_lens.sum(), dtype=np.int)
seq[starts_inds] = 1
seq = np.convolve(seq, regular_sequence)
This approach takes something like 1/20th the time on large sequences, even after changing your version to use the numpy random number generator.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With