I was reading the book 'Fluent Python' when I encountered a sentence where the author states that
If you need to store 10 million floating-point values an array is much more efficient, because an array does not actually hold full fledged objects, but only the packed bytes representing their machine values - just like array in C language.
I am not able to understand what the author is trying to convey. What is he saying about 'packed bytes' ? what does 'packed bytes storing' mean ? . How is python lists storing it ? why is it not storing it that way if that is what is making it efficient ?
Arrays can store data very compactly and are more efficient for storing large amounts of data. Arrays are great for numerical operations; lists cannot directly handle math operations. For example, you can divide each element of an array by the same number with just one line of code.
The array is faster in case of access to an element while List is faster in case of adding/deleting an element from the collection.
An array is faster and that is because ArrayList uses a fixed amount of array. However when you add an element to the ArrayList and it overflows. It creates a new Array and copies every element from the old one to the new one.
Array can only be used for specific types, whereas lists can be used for any object. Arrays can also only data of one type, whereas a list can have entries of various object types. Arrays are also more efficient for some numerical computation.
Let's say you're dealing with 8-byte floating-point numbers. "Packed bytes" in this context means that there's a dedicated chunk of allocated memory in which the first 8 bytes represent the first float, and then immediately the next 8 bytes represent the next float, and so on with no wastage. It's the most space-efficient way there is of storing the data (at least, without compression). It may also be the most time-efficient for certain operations (for example, arraywise arithmetic operations).
A Python list
doesn't store things that way. For one thing, one list element could be a float but the next one might be some other type of object. For another thing, you can remove, insert or replace items in a list. Some of these operations involve lengthening or shortening the list dynamically. All are very time- and memory- inefficient if items are stored as packed bytes. The Python list
class is designed to be as general-purpose as possible, making compromises between the efficiency of various types of operations.
Probably the most important difference is that a Python list
, in its underlying C implementation, is a container full of pointers to objects, rather than a container full of raw object content. One implication of this is that multiple references to the same Python object can appear in a list
. Another is that changing a particular item can be done very efficiently. For example, let's say the first item in your list, a[0]
, is an integer, but you want to replace it with a string that takes up more memory, e.g. a[0] = "There's a horse in aisle five."
A packed array would have to (a) make extra room, shifting all of the rest of the array content in memory and (b) separately update some sort of index of item sizes and types. But a Python list
would only have to overwrite one pointer value with another.
In the CPython implementation, the pointers themselves may still be (more or less) packed in memory. This means that inserting a new item into a list
will usually still be inefficient (relative to the way it would be if the Python list
implementation used, say, a link-list structure under the hood).
In general, there's no absolute "efficient" or "inefficient"—it's all a question of which resource you're being efficient with, what (restrictions on) content types there are in the container, and how you are intending to transform the container or its contents.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With