Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Building a small numpy array from individual values: Fast and readable method?

Tags:

python

numpy

I found that a bottleneck in my program is the creation of numpy arrays from a list of given values, most commonly putting four values into a 2x2 array. There is an obvious, easy-to-read way to do it:

my_array = numpy.array([[1, 3], [2.4, -1]])

which takes 15 us -- very very slow since I'm doing it millions of times.

Then there is a far faster, hard-to-read way:

my_array = numpy.empty((2,2))
my_array[0,0] = 1
my_array[0,1] = 3
my_array[1,0] = 2.4
my_array[1,1] = -1

This is 10 times faster, at just 1 us.

Is there any method that is BOTH fast AND easy-to-read?

What I tried so far: Using asarray instead of array makes no difference; passing dtype=float into array also makes no difference. Finally, I understand that I can do it myself:

def make_array_from_list(the_list, num_rows, num_cols):
    the_array = np.empty((num_rows, num_cols))
    for i in range(num_rows):
        for j in range(num_cols):
            the_array[i,j] = the_list[i][j]
    return the_array

This will create the array in 4us, which is medium readability at medium speed (compared to the two approaches above). But really, I cannot believe that there is not a better approach using built-in methods.

Thank you in advance!!

like image 895
Steve Byrnes Avatar asked Oct 29 '12 23:10

Steve Byrnes


People also ask

What are the different ways to create NumPy arrays?

There are three different ways to create Numpy arrays: Using Numpy functions. Conversion from other Python structures like lists. Using special library functions.

Which is faster NumPy array or list?

NumPy Arrays are faster than Python Lists because of the following reasons: An array is a collection of homogeneous data-types that are stored in contiguous memory locations. On the other hand, a list in Python is a collection of heterogeneous data types stored in non-contiguous memory locations.

Is looping through a NumPy array faster?

The following code multiplies each element of an array with a corresponding element in another array. Finally, we sum up all the individual products. Once again, the NumPy version was about 100 times faster than iterating over a list.

How can I make NumPy run faster?

By explicitly declaring the "ndarray" data type, your array processing can be 1250x faster. This tutorial will show you how to speed up the processing of NumPy arrays using Cython. By explicitly specifying the data types of variables in Python, Cython can give drastic speed increases at runtime.


1 Answers

This is a great question. I can't find anything which will approach the speed of your completely unrolled solution (edit @BiRico was able to come up with something close. See comments and update :). Here are a bunch of different options that I (and others) came up with and associated timings:

import numpy as np

def f1():
    "np.array + nested lists"
    my_array = np.array([[1, 3], [2.4, -1]])

def f2():
    "np.array + nested tuples"
    my_array = np.array(((1, 3), (2.4, -1)))

def f3():
    "Completely unrolled"
    my_array = np.empty((2,2),dtype=float)
    my_array[0,0] = 1
    my_array[0,1] = 3
    my_array[1,0] = 2.4
    my_array[1,1] = -1

def f4():
    "empty + ravel + list"
    my_array = np.empty((2,2),dtype=float)
    my_array.ravel()[:] = [1,3,2.4,-1]

def f5():
    "empty + ravel + tuple"
    my_array = np.empty((2,2),dtype=float)
    my_array.ravel()[:] = (1,3,2.4,-1)

def f6():
    "empty + slice assignment"
    my_array = np.empty((2,2),dtype=float)
    my_array[0,:] = (1,3)
    my_array[1,:] = (2.4,-1)

def f7():
    "empty + index assignment"
    my_array = np.empty((2,2),dtype=float)
    my_array[0] = (1,3)
    my_array[1] = (2.4,-1)

def f8():
    "np.array + flat list + reshape"
    my_array = np.array([1, 3, 2.4, -1]).reshape((2,2))

def f9():
    "np.empty + ndarray.flat  (Pierre GM)"
    my_array = np.empty((2,2), dtype=float)
    my_array.flat = (1,3,2.4,-1)

def f10():
    "np.fromiter (Bi Roco)"
    my_array = np.fromiter((1,3,2.4,-1), dtype=float).reshape((2,2))

import timeit
results = {}
for i in range(1,11):
    func_name = 'f%d'%i
    my_import = 'from __main__ import %s'%func_name
    func_doc = globals()[func_name].__doc__
    results[func_name] = (timeit.timeit(func_name+'()',
                                        my_import,
                                        number=100000),
                          '\t'.join((func_name,func_doc)))

for result in sorted(results.values()):
    print '\t'.join(map(str,result))

And the important timings:

On Ubuntu Linux, Core i7:

0.158674955368  f3  Completely unrolled
0.225094795227  f10 np.fromiter (Bi Roco)
0.737828969955  f8  np.array + flat list + reshape
0.782918930054  f5  empty + ravel + tuple
0.786983013153  f9  np.empty + ndarray.flat  (Pierre GM)
0.814703941345  f4  empty + ravel + list
1.2375421524    f7  empty + index assignment
1.32230591774   f2  np.array + nested tuples
1.3752617836    f6  empty + slice assignment
1.39459013939   f1  np.array + nested lists
like image 130
mgilson Avatar answered Sep 16 '22 22:09

mgilson