Generate 1d numpy with chunks of random length

Tags:

I need to generate 1D array where repeated sequences of integers are separated by a random number of zeros.

So far I am using next code for this:

from random import normalvariate

regular_sequence = np.array([1,2,3,4,5], dtype=np.int)
n_iter = 10
lag_mean = 10 # mean length of zeros sequence
lag_sd = 1 # standard deviation of zeros sequence length

# Sequence of lags lengths
lag_seq = [int(round(normalvariate(lag_mean, lag_sd))) for x in range(n_iter)]

# Generate list of concatenated zeros and regular sequences
seq = [np.concatenate((np.zeros(x, dtype=np.int), regular_sequence)) for x in lag_seq]
seq = np.concatenate(seq)

It works but looks very slow when I need a lot of long sequences. So, how can I optimize it?

733

asked Jan 08 '16 12:01

Istrel

2 Answers

You can pre-compute indices where repeated regular_sequence elements are to be put and then set those with regular_sequence in a vectorized manner. For pre-computing those indices, one can use np.cumsum to get the start of each such chunk of regular_sequence and then add a continuous set of integers extending to the size of regular_sequence to get all indices that are to be updated. Thus, the implementation would look something like this -

# Size of regular_sequence
N = regular_sequence.size

# Use cumsum to pre-compute start of every occurance of regular_sequence
offset_arr = np.cumsum(lag_seq)
idx = np.arange(offset_arr.size)*N + offset_arr

# Setup output array
out = np.zeros(idx.max() + N,dtype=regular_sequence.dtype)

# Broadcast the start indices to include entire length of regular_sequence
# to get all positions where regular_sequence elements are to be set
np.put(out,idx[:,None] + np.arange(N),regular_sequence)

Runtime tests -

def original_app(lag_seq, regular_sequence):
    seq = [np.concatenate((np.zeros(x, dtype=np.int), regular_sequence)) for x in lag_seq]
    return np.concatenate(seq)

def vectorized_app(lag_seq, regular_sequence):
    N = regular_sequence.size       
    offset_arr = np.cumsum(lag_seq)
    idx = np.arange(offset_arr.size)*N + offset_arr
    out = np.zeros(idx.max() + N,dtype=regular_sequence.dtype)
    np.put(out,idx[:,None] + np.arange(N),regular_sequence)
    return out

In [64]: # Setup inputs
    ...: regular_sequence = np.array([1,2,3,4,5], dtype=np.int)
    ...: n_iter = 1000
    ...: lag_mean = 10 # mean length of zeros sequence
    ...: lag_sd = 1 # standard deviation of zeros sequence length
    ...: 
    ...: # Sequence of lags lengths
    ...: lag_seq = [int(round(normalvariate(lag_mean, lag_sd))) for x in range(n_iter)]
    ...: 

In [65]: out1 = original_app(lag_seq, regular_sequence)

In [66]: out2 = vectorized_app(lag_seq, regular_sequence)

In [67]: %timeit original_app(lag_seq, regular_sequence)
100 loops, best of 3: 4.28 ms per loop

In [68]: %timeit vectorized_app(lag_seq, regular_sequence)
1000 loops, best of 3: 294 µs per loop

200

answered Nov 10 '22 00:11

Divakar

The best approach, I think, would be to use convolution. You can figure out the lag lengths, combine that with the length of the sequence, and use that to figure out the starting point of each regular sequence. Set those starting points to zero, then convolve with your regular sequence to fill in the values.

import numpy as np

regular_sequence = np.array([1,2,3,4,5], dtype=np.int)
n_iter = 10000000
lag_mean = 10 # mean length of zeros sequence
lag_sd = 1 # standard deviation of zeros sequence length

# Sequence of lags lengths
lag_lens = np.round(np.random.normal(lag_mean, lag_sd, n_iter)).astype(np.int)
lag_lens[1:] += len(regular_sequence)
starts_inds = lag_lens.cumsum()-1

# Generate list of convolved ones and regular sequences
seq = np.zeros(lag_lens.sum(), dtype=np.int)
seq[starts_inds] = 1
seq = np.convolve(seq, regular_sequence)

This approach takes something like 1/20th the time on large sequences, even after changing your version to use the numpy random number generator.

answered Nov 10 '22 00:11

TheBlackCat

Related questions
                            
                                Any better way to check string for valid IPv4 and IPv6?
                            
                                Streaming server issue with gunicorn and flask and Nginx
                            
                                passing webcam frames to python opencv from webrtc
                            
                                Custom abort mapping/exceptions in Flask
                            
                                PdfFileReader: PdfReadError: Could not find xref table at specified location
                            
                                Python format timedelta greater than 24 hours for display only containing hours?
                            
                                Python Remove duplicates from list of dictionaries based on a value
                            
                                Python PANDAS: Drop All Rows After First Occurrence of Column Value
                            
                                Dataframe-Normalize each row by row's maximum
                            
                                Counting occurrences of columns in numpy array
                            
                                django view render to template from another app
                            
                                Matplotlib table plot, how to add gap between the graph and table
                            
                                Can you do sums with a datetime in Python?
                            
                                Pandas: merge multiple dataframes and control column names?
                            
                                Override Falcon's default error handler when no route matches
                            
                                (Flask) Faking request.environ['REMOTE_USER'] for testing
                            
                                Get Attribute type of a model in Django
                            
                                Imported python module overrides option parser
                            
                                Plot arbitrary 2-D function in python/pyplot like Matlab's Ezplot
                            
                                Python PIL image saving

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Generate 1d numpy with chunks of random length

Tags:

python

arrays

numpy

Istrel

People also ask

2 Answers

Divakar

TheBlackCat

Recent Activity

Donate For Us