Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the advantages of NumPy over regular Python lists?

What are the advantages of NumPy over regular Python lists?

I have approximately 100 financial markets series, and I am going to create a cube array of 100x100x100 = 1 million cells. I will be regressing (3-variable) each x with each y and z, to fill the array with standard errors.

I have heard that for "large matrices" I should use NumPy as opposed to Python lists, for performance and scalability reasons. Thing is, I know Python lists and they seem to work for me.

What will the benefits be if I move to NumPy?

What if I had 1000 series (that is, 1 billion floating point cells in the cube)?

like image 971
Thomas Browne Avatar asked Jun 14 '09 23:06

Thomas Browne


People also ask

What are the advantages of NumPy arrays over python arrays and lists?

NumPy arrays are faster and more compact than Python lists. An array consumes less memory and is convenient to use. NumPy uses much less memory to store data and it provides a mechanism of specifying the data types. This allows the code to be optimized even further.

Why are NumPy arrays better than Python lists that are nested?

NumPy arrays can be much faster than nested lists and one good test of performance is a speed comparison. This test is going to be the total time it takes to add a number to each element of a 2D data structure.

What is the difference between using NumPy and lists?

Originally known as 'Numeric,' NumPy sets the framework for many data science libraries like SciPy, Scikit-Learn, Panda, and more. While Python lists store a collection of ordered, alterable data objects, NumPy arrays only store a single type of object.


2 Answers

NumPy's arrays are more compact than Python lists -- a list of lists as you describe, in Python, would take at least 20 MB or so, while a NumPy 3D array with single-precision floats in the cells would fit in 4 MB. Access in reading and writing items is also faster with NumPy.

Maybe you don't care that much for just a million cells, but you definitely would for a billion cells -- neither approach would fit in a 32-bit architecture, but with 64-bit builds NumPy would get away with 4 GB or so, Python alone would need at least about 12 GB (lots of pointers which double in size) -- a much costlier piece of hardware!

The difference is mostly due to "indirectness" -- a Python list is an array of pointers to Python objects, at least 4 bytes per pointer plus 16 bytes for even the smallest Python object (4 for type pointer, 4 for reference count, 4 for value -- and the memory allocators rounds up to 16). A NumPy array is an array of uniform values -- single-precision numbers takes 4 bytes each, double-precision ones, 8 bytes. Less flexible, but you pay substantially for the flexibility of standard Python lists!

like image 124
Alex Martelli Avatar answered Sep 20 '22 14:09

Alex Martelli


NumPy is not just more efficient; it is also more convenient. You get a lot of vector and matrix operations for free, which sometimes allow one to avoid unnecessary work. And they are also efficiently implemented.

For example, you could read your cube directly from a file into an array:

x = numpy.fromfile(file=open("data"), dtype=float).reshape((100, 100, 100)) 

Sum along the second dimension:

s = x.sum(axis=1) 

Find which cells are above a threshold:

(x > 0.5).nonzero() 

Remove every even-indexed slice along the third dimension:

x[:, :, ::2] 

Also, many useful libraries work with NumPy arrays. For example, statistical analysis and visualization libraries.

Even if you don't have performance problems, learning NumPy is worth the effort.

like image 43
Roberto Bonvallet Avatar answered Sep 19 '22 14:09

Roberto Bonvallet