Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient FIFO queue for arbitrarily sized chunks of bytes in Python

Tags:

python

How do I implement a FIFO buffer to which I can efficiently add arbitrarily sized chunks of bytes to the head and from which I can efficiently pop arbitrarily sized chunks of bytes from the tail?

Background:

I have a class that reads bytes from file-like objects in chunks of arbitrary size and is itself a file-like object from which clients can read the bytes in chunks of arbitrary size.

The way I have this implemented is that whenever a client wants to read a chunk of bytes, the class will repeatedly read from the underlying file-like objects (with chunk sizes appropriate to those objects) and add the bytes to the head of a FIFO queue until there are enough bytes in the queue to serve a chunk of the requested size to the client. It then pops those bytes off of the tail of the queue and returns them to the client.

I have a performance problem that occurs when the chunk size for the underlying file-like objects is much larger than the chunk size that clients use when reading from the class.

Say the chunk size for the underlying file-like objects is 1 MiB and the chunk size the client reads with is 1 KiB. The first time the client requests 1 KiB, the class has to read 1 MiB and add it to the FIFO queue. Then, for that request and the subsequent 1023 requests, the class has to pop 1 KiB from the tail of the FIFO queue, which gradually decreases in size from 1 MiB to 0 bytes, at which time the cycle starts again.

I have currently implemented this with a StringIO object. Writing new bytes to the end of the StringIO object is fast, but removing bytes from the beginning is very slow, because a new StringIO object, that holds a copy of the entire previous buffer minus the first chunk of bytes, must be created.

SO questions that deal with similar issues tend to point to the deque container. However, deque is implemented as as doubly linked list. Writing a chunk to the deque would require splitting the chunk into objects, each containing a single byte. The deque would then add two pointers to each object for storing, probably increasing the memory requirements by at least an order of magnitude as compared to the bytes. Also, it would take a long time to traverse the linked list and deal with each object both to split chunks into objects and to join objects into chunks.

like image 920
Roger Dahl Avatar asked Jun 06 '12 15:06

Roger Dahl


1 Answers

I have currently implemented this with a StringIO object. Writing new bytes to the end of the StringIO object is fast, but removing bytes from the beginning is very slow, because a new StringIO object, that holds a copy of the entire previous buffer minus the first chunk of bytes, must be created.

Actually the most typical way of implementing FIFO is two use wrap around buffer with two pointers as such:

enter image description hereimage source

Now, you can implement that with StringIO() using .seek() to read/write from appropriate location.

like image 142
vartec Avatar answered Sep 29 '22 00:09

vartec