Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance of bytearray and alternatives

I am writing a parser for a protocol working over TCP.

Some messages are split between multiple packets so I need to be able to "peek" into my stream with a possibility of going back and also to append incoming data at the end. On the other hand, I would like to be able to discard the content of the packets I have successfully parsed.

  • The problem with bytes is that appending requires copying (not in CPython, but then it is also impossible to delete the first bytes in an immutable object).
  • The problem with bytearray is that flushing the already seen bytes also requires copying (or so I thought, see below)
  • The problem with collections.deque is the huge memory requirement. Same with list.

However, I did some tests with bytearray and it seems the pop(0) operation is far more efficient than with lists:

from time import time

n = 100000

for container in [bytearray, list]:
    print(container)

    a = container(b'a'*n)
    t = time()
    for i in range(n):
        del a[0]
    print('del a[0]', time() - t)

    a = container(b'a'*n)
    t = time()
    for i in range(n):
        del a[-1]
    print('del a[-1]', time() - t)

    a = container(b'a'*n)
    t = time()
    for i in range(n-1):
        del a[1]
    print('del a[1]', time() - t)

    a = container(b'a'*n)
    t = time()
    for i in range(n-1):
        del a[-2]
    print('del a[-2]', time() - t)

    print()

It seems that del a[0] and del a[-1] have about the same complexity for bytearray, in cpython2, cpython3 and pypy3.

I would like to know:

  1. How is that possible? Is there a more efficient way than del a[:k] to delete the first k bytes?

  2. Is there a more efficient data structure than the bytearray? (maybe using array, memoryview or ctypes)

like image 552
Labo Avatar asked Apr 08 '18 21:04

Labo


People also ask

What is Bytearray used for?

The Python bytearray() function converts strings or collections of integers into a mutable sequence of bytes. It provides developers the usual methods Python affords to both mutable and byte data types.

What is the difference between bytes and Bytearray?

The difference between bytes() and bytearray() is that bytes() returns an object that cannot be modified, and bytearray() returns an object that can be modified.

What is type Bytearray?

The bytearray type is a mutable sequence of integers in the range between 0 and 255. It allows you to work directly with binary data. It can be used to work with low-level data such as that inside of images or arriving directly from the network. Bytearray type inherits methods from both list and str types.

How do you modify Bytearray in Python?

The bytearray type is a mutable sequence of integers in the range 0 <= x < 256. The reason you can create it with a string as each character is converted to its ASCII integer value. So when assigning 'H' you actually mean to assign 72 . If you want to be able to assign chars, then just pass each one into ord() first.


1 Answers

Python deliberately sacrifices code performance for programmer's performance.

Use whatever is the most convenient to use.

When you have a correctly working implementation and the performance proves to be inadequate, replace critical bits only (as shown by profiling) with faster equivalents. See https://wiki.python.org/moin/PythonSpeed/PerformanceTips#Overview:_Optimize_what_needs_optimizing for more info.


That said, a prime candidate for the use case you described would be a "chunked buffer" that would return slices transparently from a series of buffers.

Extracting data from it will still require copying (since all standard Python types own their memory), and you'll have interpreter overhead if you implement the type in pure Python. So to get any significant improvement, you're likely to have to go into Cython/C or something. That's why it's so important to get the general design right first -- in pure Python, it's much easier to change things.

like image 125
ivan_pozdeev Avatar answered Sep 23 '22 03:09

ivan_pozdeev