The following code models a system that can sample 3 different states at any time, and the constant transition probability between those states is given by the matrix prob_nor
. Threrefore, each point in trace
depends on the previous state.
n_states, n_frames = 3, 1000
state_val = np.linspace(0, 1, n_states)
prob = np.random.randint(1, 10, size=(n_states,)*2)
prob[np.diag_indices(n_states)] += 50
prob_nor = prob/prob.sum(1)[:,None] # transition probability matrix,
# row sum normalized to 1.0
state_idx = range(n_states) # states is a list of integers 0, 1, 2...
current_state = np.random.choice(state_idx)
trace = []
sigma = 0.1
for _ in range(n_frames):
trace.append(np.random.normal(loc=state_val[current_state], scale=sigma))
current_state = np.random.choice(state_idx, p=prob_nor[current_state, :])
The loop in the above code makes it run pretty slow, specially when I have to model millions of data points. Is there any way to vectorize/accelerate it?
The concept of vectorized operations on NumPy allows the use of more optimal and pre-compiled functions and mathematical operations on NumPy array objects and data sequences. The Output and Operations will speed up when compared to simple non-vectorized operations.
Numpy Vectorization with the numpy.Numpy vectorize function takes in a python function (pyfunc) and returns a vectorized version of the function. The vectorized version of the function takes a sequence of objects or NumPy arrays as input and evaluates the Python function over each element of the input sequence.
Again, some have observed vectorize to be faster than normal for loops, but even the NumPy documentation states: “The vectorize function is provided primarily for convenience, not for performance.
Numpy arrays tout a performance (speed) feature called vectorization. The generally held impression among the scientific computing community is that vectorization is fast because it replaces the loop (running each item one by one) with something else that runs the operation on several items in parallel.
Offload the computation of probabilities as soon as possible:
possible_paths = np.vstack(
np.random.choice(state_idx, p=prob_nor[curr_state, :], size=n_frames)
for curr_state in range(n_states)
)
Then you can simply do a lookup to follow your path:
path_trace = [None]*n_frames
for step in range(n_frames):
path_trace[step] = possible_paths[current_state, step]
current_state = possible_paths[current_state, step]
Once you have your path, you can compute your trace:
sigma = 0.1
trace = np.random.normal(loc=state_val[path_trace], scale=sigma, size=n_frames)
Comparing timings:
Pure python for
loop
%%timeit
trace_list = []
current_state = np.random.choice(state_idx)
for _ in range(n_frames):
trace_list.append(np.random.normal(loc=state_val[current_state], scale=sigma))
current_state = np.random.choice(state_idx, p=prob_nor[current_state, :])
Results:
30.1 ms ± 436 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Vectorized lookup:
%%timeit
current_state = np.random.choice(state_idx)
path_trace = [None]*n_frames
possible_paths = np.vstack(
np.random.choice(state_idx, p=prob_nor[curr_state, :], size=n_frames)
for curr_state in range(n_states)
)
for step in range(n_frames):
path_trace[step] = possible_paths[current_state, step]
current_state = possible_paths[current_state, step]
trace = np.random.normal(loc=state_val[path_trace], scale=sigma, size=n_frames)
Results:
641 µs ± 6.03 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
A speedup of approximately 50x.
Maybe I'm missing something, but I think you can create the current_state
s as a list and then vectorise the remaining steps:
# Make list of states (slow part)
states = []
current_state = np.random.choice(state_idx)
for _ in range(n_frames):
states.append(current_state)
current_state = np.random.choice(state_idx, p=prob_nor[current_state, :])
# Vectorised part
state_vals = state_val[states] # alternatively np.array(states) / (n_states - 1)
trace = np.random.normal(loc=states, scale=sigma)
I believe this method works and will lead to a modest speed improvement while using some extra memory (3 lists/arrays are created instead of one). @PMende's solution leads to much larger speed improvement.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With