Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to convert a list of indices to 2D numpy array of ones

I have a list of indices

a = [
  [1,2,4],
  [0,2,3],
  [1,3,4],
  [0,2]]

What's the fastest way to convert this to a numpy array of ones, where each index shows the position where 1 would occur?

I.e. what I want is:

output = array([
  [0,1,1,0,1],
  [1,0,1,1,0],
  [0,1,0,1,1],
  [1,0,1,0,0]])

I know the max size of the array beforehand. I know I could loop through each list and insert a 1 into at each index position, but is there a faster/vectorized way to do this?

My use case could have thousands of rows/cols and I need to do this thousands of times, so the faster the better.

like image 224
Spcogg the second Avatar asked Jun 19 '19 05:06

Spcogg the second


People also ask

Which is faster NumPy array or list?

As the array size increase, Numpy gets around 30 times faster than Python List. Because the Numpy array is densely packed in memory due to its homogeneous type, it also frees the memory faster.

Can you convert a list to a NumPy array?

Lists can be converted to arrays using the built-in functions in the Python numpy library. numpy provides us with two functions to use when converting a list into an array: numpy. array()

Is NumPy indexing fast?

Indexing in NumPy is a reasonably fast operation.

How can I speed up my NumPy operation?

By explicitly declaring the "ndarray" data type, your array processing can be 1250x faster. This tutorial will show you how to speed up the processing of NumPy arrays using Cython. By explicitly specifying the data types of variables in Python, Cython can give drastic speed increases at runtime.


1 Answers

How about this:

ncol = 5
nrow = len(a)
out = np.zeros((nrow, ncol), int)
out[np.arange(nrow).repeat([*map(len,a)]), np.concatenate(a)] = 1
out
# array([[0, 1, 1, 0, 1],
#        [1, 0, 1, 1, 0],
#        [0, 1, 0, 1, 1],
#        [1, 0, 1, 0, 0]])

Here are timings for a 1000x1000 binary array, note that I use an optimized version of the above, see function pp below:

pp 21.717635259992676 ms
ts 37.10938713003998 ms
u9 37.32933565042913 ms

Code to produce timings:

import itertools as it
import numpy as np

def make_data(n,m):
    I,J = np.where(np.random.random((n,m))<np.random.random((n,1)))
    return [*map(np.ndarray.tolist, np.split(J, I.searchsorted(np.arange(1,n))))]

def pp():
    sz = np.fromiter(map(len,a),int,nrow)
    out = np.zeros((nrow,ncol),int)
    out[np.arange(nrow).repeat(sz),np.fromiter(it.chain.from_iterable(a),int,sz.sum())] = 1
    return out

def ts():
    out = np.zeros((nrow,ncol),int)
    for i, ix in enumerate(a):
        out[i][ix] = 1
    return out

def u9():
    out = np.zeros((nrow,ncol),int)
    for i, (x, y) in enumerate(zip(a, out)):
        y[x] = 1
        out[i] = y
    return out

nrow,ncol = 1000,1000
a = make_data(nrow,ncol)

from timeit import timeit
assert (pp()==ts()).all()
assert (pp()==u9()).all()

print("pp", timeit(pp,number=100)*10, "ms")
print("ts", timeit(ts,number=100)*10, "ms")
print("u9", timeit(u9,number=100)*10, "ms")
like image 124
Paul Panzer Avatar answered Sep 28 '22 11:09

Paul Panzer