I noticed that the de facto standard for array manipulation in Python is through the excellent numpy
library. However, I know that the Python Standard Library has an array
module, which seems to me to have a similar use-case as Numpy.
Is there any actual real-world example where array
is desirable over numpy
or just plain list
?
From my naive interpretation, array
is just memory-efficient container for homogeneous data, but offers no means of improving computational efficiency.
EDIT
Just out of curiosity, I searched through Github and import array
for Python hits 186'721 counts, while import numpy
hits 8'062'678 counts.
However, I could not find a popular repository using array
.
NumPy Arrays are faster than Python Lists because of the following reasons: An array is a collection of homogeneous data-types that are stored in contiguous memory locations. On the other hand, a list in Python is a collection of heterogeneous data types stored in non-contiguous memory locations.
Numpy arrays facilitate advanced mathematical and other types of operations on large numbers of data. Typically, such operations are executed more efficiently and with less code than is possible using Python's built-in sequences.
NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a new array and delete the original. The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory.
Numpy provides the facility to copy array using different methods.
To understand the differences between numpy
and array
, I ran a few more quantitative test.
What I have found is that, for my system (Ubuntu 18.04, Python3), array
seems to be twice as fast at generating a large array from the range
generator compared to numpy
(although numpy
's dedicated np.arange()
seems to be much faster -- actually too fast, and perhaps it is caching something during tests), but twice as slow than using list
.
However, quite surprisingly, array
objects seems to be larger than the numpy
counterparts.
Instead, the list
objects are roughly 8-13% larger than array
objects (this will vary with the size of the individual items, obviously).
Compared to list
, array
offers a way to control the size of the number objects.
So, perhaps, the only sensible use case for array
is actually when numpy
is not available.
For completeness, here is the code that I used for the tests:
import numpy as np
import array
import sys
num = int(1e6)
num_i = 100
x = np.logspace(1, int(np.log10(num)), num_i).astype(int)
%timeit list(range(num))
# 10 loops, best of 3: 32.8 ms per loop
%timeit array.array('l', range(num))
# 10 loops, best of 3: 86.3 ms per loop
%timeit np.array(range(num), dtype=np.int64)
# 10 loops, best of 3: 180 ms per loop
%timeit np.arange(num, dtype=np.int64)
# 1000 loops, best of 3: 809 µs per loop
y_list = np.array([sys.getsizeof(list(range(x_i))) for x_i in x])
y_array = np.array([sys.getsizeof(array.array('l', range(x_i))) for x_i in x])
y_np = np.array([sys.getsizeof(np.array(range(x_i), dtype=np.int64)) for x_i in x])
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 6))
plt.plot(x, y_list, label='list')
plt.plot(x, y_array, label='array')
plt.plot(x, y_np, label='numpy')
plt.legend()
plt.show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With