Performance of bytearray and alternatives

Tags:

I am writing a parser for a protocol working over TCP.

Some messages are split between multiple packets so I need to be able to "peek" into my stream with a possibility of going back and also to append incoming data at the end. On the other hand, I would like to be able to discard the content of the packets I have successfully parsed.

The problem with bytes is that appending requires copying (not in CPython, but then it is also impossible to delete the first bytes in an immutable object).
The problem with bytearray is that flushing the already seen bytes also requires copying (or so I thought, see below)
The problem with collections.deque is the huge memory requirement. Same with list.

However, I did some tests with bytearray and it seems the pop(0) operation is far more efficient than with lists:

Click to copy

from time import time

n = 100000

for container in [bytearray, list]:
    print(container)

    a = container(b'a'*n)
    t = time()
    for i in range(n):
        del a[0]
    print('del a[0]', time() - t)

    a = container(b'a'*n)
    t = time()
    for i in range(n):
        del a[-1]
    print('del a[-1]', time() - t)

    a = container(b'a'*n)
    t = time()
    for i in range(n-1):
        del a[1]
    print('del a[1]', time() - t)

    a = container(b'a'*n)
    t = time()
    for i in range(n-1):
        del a[-2]
    print('del a[-2]', time() - t)

    print()

It seems that del a[0] and del a[-1] have about the same complexity for bytearray, in cpython2, cpython3 and pypy3.

I would like to know:

How is that possible? Is there a more efficient way than del a[:k] to delete the first k bytes?
Is there a more efficient data structure than the bytearray? (maybe using array, memoryview or ctypes)

552

asked Apr 08 '18 21:04

Labo

1 Answers

Python deliberately sacrifices code performance for programmer's performance.

Use whatever is the most convenient to use.

When you have a correctly working implementation and the performance proves to be inadequate, replace critical bits only (as shown by profiling) with faster equivalents. See https://wiki.python.org/moin/PythonSpeed/PerformanceTips#Overview:_Optimize_what_needs_optimizing for more info.

That said, a prime candidate for the use case you described would be a "chunked buffer" that would return slices transparently from a series of buffers.

Extracting data from it will still require copying (since all standard Python types own their memory), and you'll have interpreter overhead if you implement the type in pure Python. So to get any significant improvement, you're likely to have to go into Cython/C or something. That's why it's so important to get the general design right first -- in pure Python, it's much easier to change things.

125

answered Sep 23 '22 03:09

ivan_pozdeev

Related questions
                            
                                reverse word embeddings in keras - python
                            
                                How to provide learning rate value to tensorboard in keras
                            
                                Pandas - Fast way of accessing a column of objects' attribute
                            
                                MonitoredTrainingSession writes more than one metagraph event per run
                            
                                Determine WHY Features Are Important in Decision Tree Models
                            
                                Turning a generator of pairs into a pair of generators
                            
                                Read log file with pandas
                            
                                How to deep join a tuple into a string
                            
                                Why doesn't Keras need the gradient of a custom loss function?
                            
                                How do I do the equivalent of Gimp's Colors, Auto, White Balance in Python-Fu?
                            
                                Pandas: check if a number appear multiple times in a row
                            
                                How does tensorflow ignore undefined flags
                            
                                types.MethodType third argument in python2
                            
                                Module can't be found when called from outside
                            
                                How to handle multiple results from a coroutine function?
                            
                                pandas Categorical error: "Cannot setitem on a Categorical with a new category, set the categories first"
                            
                                Sentence Structure identification - spacy
                            
                                Changing activation function of a keras layer w/o replacing whole layer
                            
                                Write csv file and save it into S3 using AWS Lambda (python)
                            
                                Anaconda not available in PyCharm

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Performance of bytearray and alternatives

Tags:

python

arrays

data-structures

Labo

People also ask

1 Answers

ivan_pozdeev

Recent Activity

Donate For Us