Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python's sum vs. NumPy's numpy.sum

What are the differences in performance and behavior between using Python's native sum function and NumPy's numpy.sum? sum works on NumPy's arrays and numpy.sum works on Python lists and they both return the same effective result (haven't tested edge cases such as overflow) but different types.

>>> import numpy as np >>> np_a = np.array(range(5)) >>> np_a array([0, 1, 2, 3, 4]) >>> type(np_a) <class 'numpy.ndarray')  >>> py_a = list(range(5)) >>> py_a [0, 1, 2, 3, 4] >>> type(py_a) <class 'list'>  # The numerical answer (10) is the same for the following sums: >>> type(np.sum(np_a)) <class 'numpy.int32'> >>> type(sum(np_a)) <class 'numpy.int32'> >>> type(np.sum(py_a)) <class 'numpy.int32'> >>> type(sum(py_a)) <class 'int'> 

Edit: I think my practical question here is would using numpy.sum on a list of Python integers be any faster than using Python's own sum?

Additionally, what are the implications (including performance) of using a Python integer versus a scalar numpy.int32? For example, for a += 1, is there a behavior or performance difference if the type of a is a Python integer or a numpy.int32? I am curious if it is faster to use a NumPy scalar datatype such as numpy.int32 for a value that is added or subtracted a lot in Python code.

For clarification, I am working on a bioinformatics simulation which partly consists of collapsing multidimensional numpy.ndarrays into single scalar sums which are then additionally processed. I am using Python 3.2 and NumPy 1.6.

Thanks in advance!

like image 388
dpyro Avatar asked Jun 06 '12 21:06

dpyro


People also ask

Does NumPy have sum?

The numpy. sum() function is available in the NumPy package of Python. This function is used to compute the sum of all elements, the sum of each row, and the sum of each column of a given array. Essentially, this sum ups the elements of an array, takes the elements within a ndarray, and adds them together.

Why is NumPy sum faster?

The items inside a NumPy array are stored next to each other in the memory which is another reason for it being fast.

What does NumPy sum do?

NumPy sum adds up the values of a NumPy array Essentially, the NumPy sum function sums up the elements of an array. It just takes the elements within a NumPy array (an ndarray object) and adds them together.

What do you get if you apply NumPy sum () to a list that contains only Boolean values?

sum receives an array of booleans as its argument, it'll sum each element (count True as 1 and False as 0) and return the outcome. for instance np. sum([True, True, False]) will output 2 :) Hope this helps.


1 Answers

I got curious and timed it. numpy.sum seems much faster for numpy arrays, but much slower on lists.

import numpy as np import timeit  x = range(1000) # or  #x = np.random.standard_normal(1000)  def pure_sum():     return sum(x)  def numpy_sum():     return np.sum(x)  n = 10000  t1 = timeit.timeit(pure_sum, number = n) print 'Pure Python Sum:', t1 t2 = timeit.timeit(numpy_sum, number = n) print 'Numpy Sum:', t2 

Result when x = range(1000):

Pure Python Sum: 0.445913167735 Numpy Sum: 8.54926219673 

Result when x = np.random.standard_normal(1000):

Pure Python Sum: 12.1442425643 Numpy Sum: 0.303303771848 

I am using Python 2.7.2 and Numpy 1.6.1

like image 99
Akavall Avatar answered Sep 21 '22 22:09

Akavall