Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python & numpy: sum of an array slice

I have 1-dimensional numpy array (array_) and a Python list (list_).

The following code works, but is inefficient because slices involve an unnecessary copy (certainly for Python lists, and I believe also for numpy arrays?):

result = sum(array_[1:])
result = sum(list_[1:])

What's a good way to rewrite that?

like image 770
max Avatar asked May 05 '11 00:05

max


3 Answers

Slicing a numpy array doesn't make a copy, as it does in the case of a list.

As a basic example:

import numpy as np
x = np.arange(100)
y = x[1:5]
y[:] = 1000
print x[:10]

This yields:

[   0 1000 1000 1000 1000    5    6    7    8    9]

Even though we modified the values in y, it's just a view into the same memory as x.

Slicing an ndarray returns a view and doesn't duplicate the memory.

However, it would be much more efficient to use array_[1:].sum() rather than calling python's builtin sum on a numpy array.

As a quick comparison:

In [28]: x = np.arange(10000)

In [29]: %timeit x.sum()
100000 loops, best of 3: 10.2 us per loop

In [30]: %timeit sum(x)
100 loops, best of 3: 4.01 ms per loop

Edit:

In the case of the list, if for some reason you don't want to make a copy, you could always use itertools.islice. Instead of:

result = sum(some_list[1:])

you could do:

result = sum(itertools.islice(some_list, 1, None))

In most cases this is overkill, though. If you're dealing with lists long enough that memory management is a major issue, then you probably shouldn't be using a list to store your values. (Lists are not designed or intended to store items compactly in memory.)

Also, you wouldn't want to do this for a numpy array. Simply doing some_array[1:].sum() will be several orders of magnitude faster and won't use any more memory than islice.

like image 83
Joe Kington Avatar answered Sep 29 '22 20:09

Joe Kington


My first instinct was the same as Joe Kington's when it comes to lists, but I checked, and on my machine at least, islice is consistently slower!

>>> timeit.timeit("sum(l[50:950])", "l = range(1000)", number=10000)
1.0398731231689453
>>> timeit.timeit("sum(islice(l, 50, 950))", "from itertools import islice; l = range(1000)", number=10000)
1.2317550182342529
>>> timeit.timeit("sum(l[50:950000])", "l = range(1000000)", number=10)
7.9020509719848633
>>> timeit.timeit("sum(islice(l, 50, 950000))", "from itertools import islice; l = range(1000000)", number=10)
8.4522969722747803

I tried a custom_sum and found that it was faster, but not by much:

>>> setup = """
... def custom_sum(list, start, stop):
...     s = 0
...     for i in xrange(start, stop):
...         s += list[i]
...     return s
... 
... l = range(1000)
... """
>>> timeit.timeit("custom_sum(l, 50, 950)", setup, number=1000)
0.66767406463623047

Furthermore, at larger numbers, it was slower by far!

>>> setup = setup.replace("range(1000)", "range(1000000)")
>>> timeit.timeit("custom_sum(l, 50, 950000)", setup, number=10)
14.185815095901489

I couldn't think of anything else to test. (Thoughts, anyone?)

like image 40
senderle Avatar answered Sep 29 '22 19:09

senderle


@Joe Kington (this is temporary answer to just show my timings, I'll remove it soon):

In []: x= arange(1e4)
In []: %timeit sum(x)
100000 loops, best of 3: 18.8 us per loop
In []: %timeit x.sum()
100000 loops, best of 3: 17.5 us per loop
In []: x= arange(1e5)
In []: %timeit sum(x)
10000 loops, best of 3: 165 us per loop
In []: %timeit x.sum()
10000 loops, best of 3: 158 us per loop
In []: x= arange(1e2)
In []: %timeit sum(x)
100000 loops, best of 3: 4.44 us per loop
In []: %timeit x.sum()
100000 loops, best of 3: 3.2 us per loop

As far as my numpy(1.5.1) source tells, sum(.) is just a wrapper for x.sum(.). Thus with larger inputs execution time is same (asymptotically) for sum(.) and x.sum(.).

Edit: This answer was intended to be just a temporary one, but actually it (and its comments) may indeed be useful to someone. So I'll just leave it as it is just now, until someone really request me to delete it.

like image 36
eat Avatar answered Sep 29 '22 21:09

eat