when array.array is more efficient than lists?

Tags:

I was reading the book 'Fluent Python' when I encountered a sentence where the author states that

If you need to store 10 million floating-point values an array is much more efficient, because an array does not actually hold full fledged objects, but only the packed bytes representing their machine values - just like array in C language.

I am not able to understand what the author is trying to convey. What is he saying about 'packed bytes' ? what does 'packed bytes storing' mean ? . How is python lists storing it ? why is it not storing it that way if that is what is making it efficient ?

836

asked Dec 23 '16 20:12

Harish Kayarohanam

1 Answers

Let's say you're dealing with 8-byte floating-point numbers. "Packed bytes" in this context means that there's a dedicated chunk of allocated memory in which the first 8 bytes represent the first float, and then immediately the next 8 bytes represent the next float, and so on with no wastage. It's the most space-efficient way there is of storing the data (at least, without compression). It may also be the most time-efficient for certain operations (for example, arraywise arithmetic operations).

A Python list doesn't store things that way. For one thing, one list element could be a float but the next one might be some other type of object. For another thing, you can remove, insert or replace items in a list. Some of these operations involve lengthening or shortening the list dynamically. All are very time- and memory- inefficient if items are stored as packed bytes. The Python list class is designed to be as general-purpose as possible, making compromises between the efficiency of various types of operations.

Probably the most important difference is that a Python list, in its underlying C implementation, is a container full of pointers to objects, rather than a container full of raw object content. One implication of this is that multiple references to the same Python object can appear in a list. Another is that changing a particular item can be done very efficiently. For example, let's say the first item in your list, a[0], is an integer, but you want to replace it with a string that takes up more memory, e.g. a[0] = "There's a horse in aisle five." A packed array would have to (a) make extra room, shifting all of the rest of the array content in memory and (b) separately update some sort of index of item sizes and types. But a Python list would only have to overwrite one pointer value with another.

In the CPython implementation, the pointers themselves may still be (more or less) packed in memory. This means that inserting a new item into a list will usually still be inefficient (relative to the way it would be if the Python list implementation used, say, a link-list structure under the hood).

In general, there's no absolute "efficient" or "inefficient"—it's all a question of which resource you're being efficient with, what (restrictions on) content types there are in the container, and how you are intending to transform the container or its contents.

165

answered Oct 23 '22 13:10

jez

Related questions
                            
                                Extract Python dictionary from string
                            
                                kivy python passing parameters to fuction with button click
                            
                                TypeError: sequence item 1: expected a bytes-like object, str found
                            
                                Filter list's elements by type of each element
                            
                                How to set argparse arguments from python script
                            
                                Assigning float as a dictionary key changes its precision (Python)
                            
                                flask-login:Exception: No user_loader has been installed for this LoginManager. Add one with the 'LoginManager.user_loader' decorator
                            
                                Why can yield be indexed?
                            
                                Python traceback.print_exc() returns 'None'
                            
                                Celery, Group task, AttributeError: 'NoneType' object has no attribute 'app'
                            
                                Peewee ORM JSONField for MySQL
                            
                                Force incrementation in pandas rank method
                            
                                time complexity of sorting a dictionary
                            
                                Python AttributeError: module 'string' has no attribute 'maketrans'
                            
                                Python: Convert tuple to comma separated String
                            
                                Using 'case' in a select query sqlalchemy throwing exception
                            
                                How many times a number appears in a numpy array
                            
                                Python PyQt5 QTreeWidget sub item
                            
                                Check if a country entered is one of the countries of the world
                            
                                Celery SQS + Duplication of tasks + SQS visibility timeout

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

when array.array is more efficient than lists?

Tags:

python

arrays

list

Harish Kayarohanam

People also ask

1 Answers

jez

Recent Activity

Donate For Us