Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Vectorize numpy code with operation depending on previous value

The following code models a system that can sample 3 different states at any time, and the constant transition probability between those states is given by the matrix prob_nor. Threrefore, each point in trace depends on the previous state.

n_states, n_frames = 3, 1000
state_val = np.linspace(0, 1, n_states)

prob = np.random.randint(1, 10, size=(n_states,)*2)
prob[np.diag_indices(n_states)] += 50

prob_nor = prob/prob.sum(1)[:,None] # transition probability matrix, 
                                    # row sum normalized to 1.0

state_idx = range(n_states) # states is a list of integers 0, 1, 2...
current_state = np.random.choice(state_idx)

trace = []      
sigma = 0.1     
for _ in range(n_frames):
    trace.append(np.random.normal(loc=state_val[current_state], scale=sigma))
    current_state = np.random.choice(state_idx, p=prob_nor[current_state, :])

The loop in the above code makes it run pretty slow, specially when I have to model millions of data points. Is there any way to vectorize/accelerate it?

like image 731
Brenlla Avatar asked Oct 08 '18 15:10

Brenlla


People also ask

Are NumPy operations vectorized?

The concept of vectorized operations on NumPy allows the use of more optimal and pre-compiled functions and mathematical operations on NumPy array objects and data sequences. The Output and Operations will speed up when compared to simple non-vectorized operations.

How does NumPy implement vectorization?

Numpy Vectorization with the numpy.Numpy vectorize function takes in a python function (pyfunc) and returns a vectorized version of the function. The vectorized version of the function takes a sequence of objects or NumPy arrays as input and evaluates the Python function over each element of the input sequence.

Does NumPy vectorize fast?

Again, some have observed vectorize to be faster than normal for loops, but even the NumPy documentation states: “The vectorize function is provided primarily for convenience, not for performance.

Why are NumPy vectorized operations faster?

Numpy arrays tout a performance (speed) feature called vectorization. The generally held impression among the scientific computing community is that vectorization is fast because it replaces the loop (running each item one by one) with something else that runs the operation on several items in parallel.


2 Answers

Offload the computation of probabilities as soon as possible:

possible_paths = np.vstack(
    np.random.choice(state_idx, p=prob_nor[curr_state, :], size=n_frames)
    for curr_state in range(n_states)
)

Then you can simply do a lookup to follow your path:

path_trace = [None]*n_frames
for step in range(n_frames):
    path_trace[step] = possible_paths[current_state, step]
    current_state = possible_paths[current_state, step]

Once you have your path, you can compute your trace:

sigma = 0.1
trace = np.random.normal(loc=state_val[path_trace], scale=sigma, size=n_frames)

Comparing timings:

Pure python for loop

%%timeit
trace_list = []
current_state = np.random.choice(state_idx)
for _ in range(n_frames):
    trace_list.append(np.random.normal(loc=state_val[current_state], scale=sigma))
    current_state = np.random.choice(state_idx, p=prob_nor[current_state, :])

Results:

30.1 ms ± 436 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Vectorized lookup:

%%timeit
current_state = np.random.choice(state_idx)
path_trace = [None]*n_frames
possible_paths = np.vstack(
    np.random.choice(state_idx, p=prob_nor[curr_state, :], size=n_frames)
    for curr_state in range(n_states)
)
for step in range(n_frames):
    path_trace[step] = possible_paths[current_state, step]
    current_state = possible_paths[current_state, step]
trace = np.random.normal(loc=state_val[path_trace], scale=sigma, size=n_frames)

Results:

641 µs ± 6.03 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

A speedup of approximately 50x.

like image 185
PMende Avatar answered Nov 15 '22 05:11

PMende


Maybe I'm missing something, but I think you can create the current_states as a list and then vectorise the remaining steps:

# Make list of states (slow part)
states = []
current_state = np.random.choice(state_idx)
for _ in range(n_frames):
    states.append(current_state)
    current_state = np.random.choice(state_idx, p=prob_nor[current_state, :])

# Vectorised part
state_vals = state_val[states]   # alternatively np.array(states) / (n_states - 1)
trace = np.random.normal(loc=states, scale=sigma)

I believe this method works and will lead to a modest speed improvement while using some extra memory (3 lists/arrays are created instead of one). @PMende's solution leads to much larger speed improvement.

like image 42
Stuart Avatar answered Nov 15 '22 05:11

Stuart