Suppose I have a matrix that is 100000 x 100
import numpy as np
mat = np.random.randint(2, size=(100000,100))
I wish to go through this matrix, and if each row contains entirely either 1 or 0 I wish to change a state variable to that value. If the state is not changed, I wish to set the entire row the value of state. The initial value of state is 0. 
Naively in a for loop this can be done as follows
state = 0
for row in mat:
    if set(row) == {1}:
        state = 1
    elif set(row) == {0}:
        state = 0
    else:
        row[:] = state
However, when the size of the matrix increases this takes an impractical amount of time. Could someone point me in the direction in how to leverage numpy to vectorize this loop and speed it up?
So for a sample input
array([[0, 1, 0],
       [0, 0, 1],
       [1, 1, 1],
       [0, 0, 1],
       [0, 0, 1]])
The expected output in this case would be
array([[0, 0, 0],
       [0, 0, 0],
       [1, 1, 1],
       [1, 1, 1],
       [1, 1, 1]])
                Approach #1: NumPy-Vectorized
Here's a vectorized one -
def check_all(a, state): # a is input matrix/array
    # Get zeros and ones all masks
    zm = (a==0).all(1)
    om = (a==1).all(1)
    # "Attach" boundaries with False values at the start of these masks.
    # These will be used to detect rising edges (as indices) on these masks.
    zma = np.r_[False,zm]
    oma = np.r_[False,om]
    omi = np.flatnonzero(oma[:-1] < oma[1:])
    zmi = np.flatnonzero(zma[:-1] < zma[1:])
    # Group the indices and the signatures (values as 1s and -1s)
    ai = np.r_[omi,zmi]
    av = np.r_[np.ones(len(omi),dtype=int),-np.ones(len(zmi),dtype=int)]
    # Sort the grouped-indices, thus we would know the positions
    # of these group starts. Then index into the signatures/values
    # and indices with those, giving us the information on how these signatures
    # occur through the length of the input
    sidx = ai.argsort()
    val,aidx = av[sidx],ai[sidx]
    # The identical consecutive signatures are to be removed
    mask = np.r_[True,val[:-1]!=val[1:]]
    v,i = val[mask],aidx[mask]
    # Also, note that we are assigning all 1s as +1 signature and all 0s as -1
    # So, in case the starting signature is a 0, assign a value of 0
    if v[0]==-1:
        v[0] = 0
    # Initialize 1D o/p array, which stores the signatures as +1s and -1s.
    # The bigger level idea is that performing cumsum at the end would give us the
    # desired 1D output
    out1d = np.zeros(len(a),dtype=a.dtype)
    # Assign the values at i positions
    out1d[i] = v
    # Finally cumsum to get desired output
    out1dc = out1d.cumsum()
    # Correct the starting positions based on starting state value
    out1dc[:i[0]] = state
    # Convert to 2D view for mem. and perf. efficiency
    out = np.broadcast_to(out1dc[:,None],a.shape)
    return out
Approach #2: Numba-based
Here's another numba-based one for memory and hence perf. efficiency -
@njit(parallel=True)
def func1(zm, om, out, start_state, cur_state):
    # This outputs 1D version of required output.
    # Start off with the starting given state
    newval = start_state
    # Loop through zipped zeros-all and ones-all masks and in essence do :
    # Switch between zeros and ones based on whether the other ones
    # are occuring through or not, prior to the current state
    for i,(z,o) in enumerate(zip(zm,om)):
        if z and cur_state:
            cur_state = ~cur_state
            newval = 0
        if o and ~cur_state:
            cur_state = ~cur_state
            newval = 1
        out[i] = newval
    return out
def check_all_numba(a, state):
    # Get zeros and ones all masks
    zm = (a==0).all(1)
    om = (a==1).all(1)
    # Decide the starting state
    cur_state = zm.argmax() < om.argmax()
    # Initialize 1D o/p array with given state values
    out1d = np.full(len(a), fill_value=state)
    func1(zm, om, out1d, state, cur_state)
    # Broadcast into the 2D view for memory and perf. efficiency
    return np.broadcast_to(out1d[:,None],a.shape)
                        You can do this without any loops by leveraging np.accumulate:
R = 5 # 100000
C = 3 # 100
mat   = np.random.randint(2, size=(R,C))
print(mat) # original matrix
state    = np.zeros((1,C))                        # or np.ones((1,C))
mat      = np.concatenate([state,mat])            # insert state row
zRows    = np.isin(np.sum(mat,1),[0,C])           # all zeroes or all ones
iRows    = np.arange(R+1) * zRows.astype(np.int)  # base indexes
mat      = mat[np.maximum.accumulate(iRows)][1:]  # indirection, remove state
print(mat) # modified
#original
[[0 0 1]
 [1 1 1]
 [1 0 1]
 [0 0 0]
 [1 0 1]]
# modified
[[0 0 0]
 [1 1 1]
 [1 1 1]
 [0 0 0]
 [0 0 0]]
The way it works is by preparing an indirection array for rows that need to be changed. This is done from an np.arange of row indexes in which we set to zero the the indexes that will need replacement. Accumulating the maximum index will map each replaced row to an all-zero or all-one row before it.
For example:
  [ 0, 1, 2, 3, 4, 5 ] # row indexes
  [ 0, 1, 0, 0, 1, 0 ] # rows that are all zeroes or all ones (zRows)
  [ 0, 1, 0, 0, 4, 0 ] # multiplied (iRows)
  [ 0, 1, 1, 1, 4, 4 ] # np.maximum.accumulate
This gives us a list of indexes where row content should be taken from.
The state is represented by an extra row inserted at the beginning of the matrix before performing the operation and removed afterward.
This solution will be marginally slower for very small matrices (5x3) but it can give you a 20x speed boost for larger ones (100000x100: 0.7 second vs 14 seconds).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With