What are the differences in performance and behavior between using Python's native sum
function and NumPy's numpy.sum
? sum
works on NumPy's arrays and numpy.sum
works on Python lists and they both return the same effective result (haven't tested edge cases such as overflow) but different types.
>>> import numpy as np >>> np_a = np.array(range(5)) >>> np_a array([0, 1, 2, 3, 4]) >>> type(np_a) <class 'numpy.ndarray') >>> py_a = list(range(5)) >>> py_a [0, 1, 2, 3, 4] >>> type(py_a) <class 'list'> # The numerical answer (10) is the same for the following sums: >>> type(np.sum(np_a)) <class 'numpy.int32'> >>> type(sum(np_a)) <class 'numpy.int32'> >>> type(np.sum(py_a)) <class 'numpy.int32'> >>> type(sum(py_a)) <class 'int'>
Edit: I think my practical question here is would using numpy.sum
on a list of Python integers be any faster than using Python's own sum
?
Additionally, what are the implications (including performance) of using a Python integer versus a scalar numpy.int32
? For example, for a += 1
, is there a behavior or performance difference if the type of a
is a Python integer or a numpy.int32
? I am curious if it is faster to use a NumPy scalar datatype such as numpy.int32
for a value that is added or subtracted a lot in Python code.
For clarification, I am working on a bioinformatics simulation which partly consists of collapsing multidimensional numpy.ndarray
s into single scalar sums which are then additionally processed. I am using Python 3.2 and NumPy 1.6.
Thanks in advance!
The numpy. sum() function is available in the NumPy package of Python. This function is used to compute the sum of all elements, the sum of each row, and the sum of each column of a given array. Essentially, this sum ups the elements of an array, takes the elements within a ndarray, and adds them together.
The items inside a NumPy array are stored next to each other in the memory which is another reason for it being fast.
NumPy sum adds up the values of a NumPy array Essentially, the NumPy sum function sums up the elements of an array. It just takes the elements within a NumPy array (an ndarray object) and adds them together.
sum receives an array of booleans as its argument, it'll sum each element (count True as 1 and False as 0) and return the outcome. for instance np. sum([True, True, False]) will output 2 :) Hope this helps.
I got curious and timed it. numpy.sum
seems much faster for numpy arrays, but much slower on lists.
import numpy as np import timeit x = range(1000) # or #x = np.random.standard_normal(1000) def pure_sum(): return sum(x) def numpy_sum(): return np.sum(x) n = 10000 t1 = timeit.timeit(pure_sum, number = n) print 'Pure Python Sum:', t1 t2 = timeit.timeit(numpy_sum, number = n) print 'Numpy Sum:', t2
Result when x = range(1000)
:
Pure Python Sum: 0.445913167735 Numpy Sum: 8.54926219673
Result when x = np.random.standard_normal(1000)
:
Pure Python Sum: 12.1442425643 Numpy Sum: 0.303303771848
I am using Python 2.7.2 and Numpy 1.6.1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With