Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python numpy array vs list

I need to perform some calculations a large list of numbers.

Do array.array or numpy.array offer significant performance boost over typical arrays?

I don't have to do complicated manipulations on the arrays, I just need to be able to access and modify values,

e.g.

import numpy
x = numpy.array([0] * 1000000)
for i in range(1,len(x)):
  x[i] = x[i-1] + i

So I will not really be needing concatenation, slicing, etc.

Also, it looks like array throws an error if I try to assign values that don't fit in C long:

import numpy
a = numpy.array([0])
a[0] += 1232234234234324353453453
print(a)

On console I get:

a[0] += 1232234234234324353453453
OverflowError: Python int too large to convert to C long

Is there a variation of array that lets me put in unbounded Python integers? Or would doing it that way take away the point of having arrays in the first place?

like image 532
math4tots Avatar asked Feb 09 '12 23:02

math4tots


People also ask

Which is better NumPy array or list?

Advantages of using Numpy Arrays Over Python Lists: Consumes less memory. Fast as compared to the python List. Convenient to use.

Is a NumPy array the same as a list?

Lists are another data structure, similar to NumPy arrays, but unlike NumPy arrays, lists are a part of core Python. Lists have a variety of uses. They are useful, for example, in various bookkeeping tasks that arise in computer programming. Like arrays, they are sometimes used to store data.

Is NumPy array or list faster?

NumPy Arrays are faster than Python Lists because of the following reasons: An array is a collection of homogeneous data-types that are stored in contiguous memory locations. On the other hand, a list in Python is a collection of heterogeneous data types stored in non-contiguous memory locations.

Are NumPy arrays more efficient than lists?

Because the Numpy array is densely packed in memory due to its homogeneous type, it also frees the memory faster. So overall a task executed in Numpy is around 5 to 100 times faster than the standard python list, which is a significant leap in terms of speed.


2 Answers

You first need to understand the difference between arrays and lists.

An array is a contiguous block of memory consisting of elements of some type (e.g. integers).

You cannot change the size of an array once it is created.
It therefore follows that each integer element in an array has a fixed size, e.g. 4 bytes.

On the other hand, a list is merely an "array" of addresses (which also have a fixed size).

But then each element holds the address of something else in memory, which is the actual integer that you want to work with. Of course, the size of this integer is irrelevant to the size of the array. Thus you can always create a new (bigger) integer and "replace" the old one without affecting the size of the array, which merely holds the address of an integer.

Of course, this convenience of a list comes at a cost: Performing arithmetic on the integers now requires a memory access to the array, plus a memory access to the integer itself, plus the time it takes to allocate more memory (if needed), plus the time required to delete the old integer (if needed). So yes, it can be slower, so you have to be careful what you're doing with each integer inside an array.

like image 102
user541686 Avatar answered Sep 24 '22 15:09

user541686


Your first example could be speed up. Python loop and access to individual items in a numpy array are slow. Use vectorized operations instead:

import numpy as np
x = np.arange(1000000).cumsum()

You can put unbounded Python integers to numpy array:

a = np.array([0], dtype=object)
a[0] += 1232234234234324353453453

Arithmetic operations compared to fixed-sized C integers would be slower in this case.

like image 34
jfs Avatar answered Sep 22 '22 15:09

jfs