Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate 1d numpy with chunks of random length

I need to generate 1D array where repeated sequences of integers are separated by a random number of zeros.

So far I am using next code for this:

from random import normalvariate

regular_sequence = np.array([1,2,3,4,5], dtype=np.int)
n_iter = 10
lag_mean = 10 # mean length of zeros sequence
lag_sd = 1 # standard deviation of zeros sequence length

# Sequence of lags lengths
lag_seq = [int(round(normalvariate(lag_mean, lag_sd))) for x in range(n_iter)]

# Generate list of concatenated zeros and regular sequences
seq = [np.concatenate((np.zeros(x, dtype=np.int), regular_sequence)) for x in lag_seq]
seq = np.concatenate(seq)

It works but looks very slow when I need a lot of long sequences. So, how can I optimize it?

like image 733
Istrel Avatar asked Jan 08 '16 12:01

Istrel


People also ask

How do I create a numpy array of random values?

To create a numpy array of specific shape with random values, use numpy. random. rand() with the shape of the array passed as argument. In this tutorial, we will learn how to create a numpy array with random values using examples.

How do you scramble a numpy array?

You can use numpy. random. shuffle() . This function only shuffles the array along the first axis of a multi-dimensional array.

What is random rand () function in numpy?

The numpy.random.rand() function creates an array of specified shape and fills it with random values. Syntax : numpy.random.rand(d0, d1, ..., dn) Parameters : d0, d1, ..., dn : [int, optional]Dimension of the returned array we require, If no argument is given a single Python float is returned.


2 Answers

You can pre-compute indices where repeated regular_sequence elements are to be put and then set those with regular_sequence in a vectorized manner. For pre-computing those indices, one can use np.cumsum to get the start of each such chunk of regular_sequence and then add a continuous set of integers extending to the size of regular_sequence to get all indices that are to be updated. Thus, the implementation would look something like this -

# Size of regular_sequence
N = regular_sequence.size

# Use cumsum to pre-compute start of every occurance of regular_sequence
offset_arr = np.cumsum(lag_seq)
idx = np.arange(offset_arr.size)*N + offset_arr

# Setup output array
out = np.zeros(idx.max() + N,dtype=regular_sequence.dtype)

# Broadcast the start indices to include entire length of regular_sequence
# to get all positions where regular_sequence elements are to be set
np.put(out,idx[:,None] + np.arange(N),regular_sequence)

Runtime tests -

def original_app(lag_seq, regular_sequence):
    seq = [np.concatenate((np.zeros(x, dtype=np.int), regular_sequence)) for x in lag_seq]
    return np.concatenate(seq)

def vectorized_app(lag_seq, regular_sequence):
    N = regular_sequence.size       
    offset_arr = np.cumsum(lag_seq)
    idx = np.arange(offset_arr.size)*N + offset_arr
    out = np.zeros(idx.max() + N,dtype=regular_sequence.dtype)
    np.put(out,idx[:,None] + np.arange(N),regular_sequence)
    return out

In [64]: # Setup inputs
    ...: regular_sequence = np.array([1,2,3,4,5], dtype=np.int)
    ...: n_iter = 1000
    ...: lag_mean = 10 # mean length of zeros sequence
    ...: lag_sd = 1 # standard deviation of zeros sequence length
    ...: 
    ...: # Sequence of lags lengths
    ...: lag_seq = [int(round(normalvariate(lag_mean, lag_sd))) for x in range(n_iter)]
    ...: 

In [65]: out1 = original_app(lag_seq, regular_sequence)

In [66]: out2 = vectorized_app(lag_seq, regular_sequence)

In [67]: %timeit original_app(lag_seq, regular_sequence)
100 loops, best of 3: 4.28 ms per loop

In [68]: %timeit vectorized_app(lag_seq, regular_sequence)
1000 loops, best of 3: 294 µs per loop
like image 200
Divakar Avatar answered Nov 10 '22 00:11

Divakar


The best approach, I think, would be to use convolution. You can figure out the lag lengths, combine that with the length of the sequence, and use that to figure out the starting point of each regular sequence. Set those starting points to zero, then convolve with your regular sequence to fill in the values.

import numpy as np

regular_sequence = np.array([1,2,3,4,5], dtype=np.int)
n_iter = 10000000
lag_mean = 10 # mean length of zeros sequence
lag_sd = 1 # standard deviation of zeros sequence length

# Sequence of lags lengths
lag_lens = np.round(np.random.normal(lag_mean, lag_sd, n_iter)).astype(np.int)
lag_lens[1:] += len(regular_sequence)
starts_inds = lag_lens.cumsum()-1

# Generate list of convolved ones and regular sequences
seq = np.zeros(lag_lens.sum(), dtype=np.int)
seq[starts_inds] = 1
seq = np.convolve(seq, regular_sequence)

This approach takes something like 1/20th the time on large sequences, even after changing your version to use the numpy random number generator.

like image 34
TheBlackCat Avatar answered Nov 10 '22 00:11

TheBlackCat