How to vectorize advanced indexing with list of lists in NumPy?

Question

The following code runs in 45s when using pure Python.

for iteration in range(maxiter):
    for node in range(n):
        for dest in adjacency_list[node]:
            rs[iteration + 1][dest] += beta * rs[iteration][node] / len(adjacency_list[node])

But, by simply initializing rs as a numpy ndarray instead of a python list of lists, the code runs in 145s. I do not really know why numpy takes 3 times as much time with this array indexing.

My idea was to vectorize as much things as possible, but have only managed to vectorize the multiplication of beta/len(adjacency_list[node]). This code runs in 77s.

beta_over_out_degree = np.array([beta / len(al) for al in adjacency_list])
for iteration in range(1, maxiter + 1):
    r_next = np.full(shape=n, fill_value=(1 - beta) / n)
    f = beta_over_out_degree * r
    for i in range(n):
        r_next[adjacency_list[i]] += f[i]

    r = np.copy(r_next)
    rs[iteration] = np.copy(r)

The problem is that adjacency_list is a list of lists with differing column size, with 100 000 rows and 1-15 columns. A more standard approach with an adjacency matrix, at least as a normal ndarray, is not an option, since for n=100 000 its shape of (n,n) is too big to be allocated to memory.

Is there any way to vectorize using its indexes for numpy advanced indexing(maybe turning it into a numpy ndarray)?

I would also greatly appreciate any other speed tips. Thanks in advance!

EDIT: Thanks to @stevemo I managed to create adjacency_matrix with csr_matrix functionality and used it for iterative multiplication. Program now runs in only 2s!

for iteration in range(1, 101):
    rs[iteration] += rs[iteration - 1] * adjacency_matrix

stevemo · Accepted Answer

If I understand you correctly, this can be done with a one-liner formula using matrix powers of the adjacency matrix.

Based on your original code snippet, it seems you have some network of n nodes, with the adjacency information stored as a list of lists in adjacency, and you have a value r associated with each node such its value at iteration k+1 is beta times the sum of r of each of its neighbors at iter k. (Your loop constructs this in the opposite direction, but same thing.)

If you don't mind reforming your adjacency list-of-lists into a more standard adjacency matrix, such that A_ij = 1 if ij are neighbors, 0 otherwise, then you could accomplish the inner two loops with a simple matrix product, r[k+1] = beta * (A @ r[k]).

And following this logic, r[k+2] = beta * (A @ (beta * (A @ r[k]))) = (beta * A)**2 @ r[k] or in general,

r[k] = (beta * A)**k @ r[0]

Let's try this on a small network:

# adjacency matrix
A = np.array([
    [0, 1, 1, 0, 0],
    [1, 0, 1, 0, 0],
    [1, 1, 0, 1, 0],
    [0, 0, 1, 0, 1],
    [0, 0, 0, 1, 0]
])

# initial values
n = 5
beta = 0.5
r0 = np.ones(n)
maxiter = 10

# after one iteration
print(beta * (A @ r0))
# [1.  1.  1.5 1.  0.5]

# after 10 iterations
print(np.linalg.matrix_power((beta * A), maxiter) @ r0)
# [2.88574219 2.88574219 3.4921875  1.99414062 0.89257812]

How to vectorize advanced indexing with list of lists in NumPy?

Tags:

python

numpy

pagerank

sukisule

1 Answers

stevemo

Recent Activity

Donate For Us

How to vectorize advanced indexing with list of lists in NumPy?

Tags:

python

numpy

pagerank

sukisule

1 Answers

stevemo

Related questions

Recent Activity

Donate For Us