Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to vectorize advanced indexing with list of lists in NumPy?

The following code runs in 45s when using pure Python.

for iteration in range(maxiter):
    for node in range(n):
        for dest in adjacency_list[node]:
            rs[iteration + 1][dest] += beta * rs[iteration][node] / len(adjacency_list[node])

But, by simply initializing rs as a numpy ndarray instead of a python list of lists, the code runs in 145s. I do not really know why numpy takes 3 times as much time with this array indexing.

My idea was to vectorize as much things as possible, but have only managed to vectorize the multiplication of beta/len(adjacency_list[node]). This code runs in 77s.

beta_over_out_degree = np.array([beta / len(al) for al in adjacency_list])
for iteration in range(1, maxiter + 1):
    r_next = np.full(shape=n, fill_value=(1 - beta) / n)
    f = beta_over_out_degree * r
    for i in range(n):
        r_next[adjacency_list[i]] += f[i]

    r = np.copy(r_next)
    rs[iteration] = np.copy(r)

The problem is that adjacency_list is a list of lists with differing column size, with 100 000 rows and 1-15 columns. A more standard approach with an adjacency matrix, at least as a normal ndarray, is not an option, since for n=100 000 its shape of (n,n) is too big to be allocated to memory.

Is there any way to vectorize using its indexes for numpy advanced indexing(maybe turning it into a numpy ndarray)?

I would also greatly appreciate any other speed tips. Thanks in advance!

EDIT: Thanks to @stevemo I managed to create adjacency_matrix with csr_matrix functionality and used it for iterative multiplication. Program now runs in only 2s!

for iteration in range(1, 101):
    rs[iteration] += rs[iteration - 1] * adjacency_matrix
like image 267
sukisule Avatar asked Dec 07 '25 08:12

sukisule


1 Answers

If I understand you correctly, this can be done with a one-liner formula using matrix powers of the adjacency matrix.

Based on your original code snippet, it seems you have some network of n nodes, with the adjacency information stored as a list of lists in adjacency, and you have a value r associated with each node such its value at iteration k+1 is beta times the sum of r of each of its neighbors at iter k. (Your loop constructs this in the opposite direction, but same thing.)

If you don't mind reforming your adjacency list-of-lists into a more standard adjacency matrix, such that A_ij = 1 if ij are neighbors, 0 otherwise, then you could accomplish the inner two loops with a simple matrix product, r[k+1] = beta * (A @ r[k]).

And following this logic, r[k+2] = beta * (A @ (beta * (A @ r[k]))) = (beta * A)**2 @ r[k] or in general,

r[k] = (beta * A)**k @ r[0]

Let's try this on a small network:

# adjacency matrix
A = np.array([
    [0, 1, 1, 0, 0],
    [1, 0, 1, 0, 0],
    [1, 1, 0, 1, 0],
    [0, 0, 1, 0, 1],
    [0, 0, 0, 1, 0]
])

# initial values
n = 5
beta = 0.5
r0 = np.ones(n)
maxiter = 10

# after one iteration
print(beta * (A @ r0))
# [1.  1.  1.5 1.  0.5]

# after 10 iterations
print(np.linalg.matrix_power((beta * A), maxiter) @ r0)
# [2.88574219 2.88574219 3.4921875  1.99414062 0.89257812]
like image 143
stevemo Avatar answered Dec 09 '25 22:12

stevemo