Changing the Buffer size in multiprocessing.Queue

Tags:

So I have a system with a producer and a consumer are connected by a queue of unlimited size, but if the consumer repeatedly calls get until the Empty exception is thrown it does not clear the queue.

I believe that this is because the thread in the queue on the consumer side which serialises the objects into the socket gets blocked once the socket buffer is full, and so it then waits until the buffer has space, however, it is possible for the consumer to call get "too fast" and so it thinks the queue is empty when in fact the thread on the other side has much more data to send but just cannot serialise it fast enough to prevent the socket appearing empty to the consumer.

I believe that this problem would be alleviated if I could change the buffer size on the underlying socket ( I am windows based). As far as I can see what I need to do then is something like:

import multiprocessing.connections as conns
conns.BUFSIZE = 2 ** 16 # is typically set as 2 ** 13 for windows
import multiprocessing.Queue as q

If I do the above, does that mean that when multirprocssing initialises a queue it will use the new buffer size which I have set in the version of multiprocessing.connections that I have already imported? Is that correct?

Also I beleive that this will only affect windows, as BUFSIZE is not used on linux machines because their all sockets are set to 60 kilobytes by default?

Has anyone tried this before? Would this have side-effects on windows? And what are the fundamental limits on socket buffer sizes on windows?

===================A code sample to demonstrate===================

# import multiprocessing.connection as conn
# conn.BUFSIZE = 2 ** 19
import sys
import multiprocessing as mp
from Queue import Empty
from time import sleep

total_length = 10**8

def supplier(q):
    print "Starting feeder"
    for i in range(total_length) :
        q.put(i)


if __name__=="__main__":

    queue = mp.Queue()

    p = mp.Process(target=supplier, args=(queue,))

    p.start()

    sleep(120)

    returned = []
    while True :
        try :
            returned.append(queue.get(block=False))
        except Empty :
            break

    print len(returned)
    print len(returned) == total_length

    p.terminate()
    sys.exit()

This sample, when run on windows, will typically only pull around 160,000 items from the queue, because the main thread can empty the buffer faster than it is refilled by the supplier and eventually it tries to pull from the queue when the buffer is empty and reports that it is empty.

You can in theory ameliorate this problem by having a larger buffer size. The two lines at the top will, i believe, on windows system, increase the default buffer size for the pipe.

If you comment them in then this script will pull more data before it exits, since it has a much higher . My main questions are then: 1) Does this actually work. 2) Is there a way to make this code use the same size of underlying buffer in windows and linux 3) Are there any unexpected side effects from setting large buffer sizes for pipes.

I am aware that in general, there is no way to know whether you have pulled all of the data from the queue (- given that the supplier runs permanently and produces data very unevenly), but I am looking for ways to improve that on a best effort basis.

245

asked Nov 29 '16 17:11

phil_20686

1 Answers

Update:

useful link of Windows Pipe for people who need it in the future(the link is provided by OP, phil_20686): https://msdn.microsoft.com/en-us/library/windows/desktop/aa365150(v=vs.85).aspx

Origianl:

BUFSIZE is working only when the platform is win32.

multiprocessing.Queue is built on the top of Pipe, if you change the BUFSIZE, the Queue you generated will use the updated value. see below:

class Queue(object):

    def __init__(self, maxsize=0):
        if maxsize <= 0:
            maxsize = _multiprocessing.SemLock.SEM_VALUE_MAX
        self._maxsize = maxsize
        self._reader, self._writer = Pipe(duplex=False)

When the platform is win32, Pipe code will invoke following code:

def Pipe(duplex=True):
    '''
    Returns pair of connection objects at either end of a pipe
    '''
    address = arbitrary_address('AF_PIPE')
    if duplex:
        openmode = win32.PIPE_ACCESS_DUPLEX
        access = win32.GENERIC_READ | win32.GENERIC_WRITE
        obsize, ibsize = BUFSIZE, BUFSIZE
    else:
        openmode = win32.PIPE_ACCESS_INBOUND
        access = win32.GENERIC_WRITE
        obsize, ibsize = 0, BUFSIZE

    h1 = win32.CreateNamedPipe(
        address, openmode,
        win32.PIPE_TYPE_MESSAGE | win32.PIPE_READMODE_MESSAGE |
        win32.PIPE_WAIT,
        1, obsize, ibsize, win32.NMPWAIT_WAIT_FOREVER, win32.NULL
        )

You can see that when duplex is False, outbuffer size is 0 and inbuffer size is BUFSIZE.

inbuffer is the number of bytes to reserve for the input buffer. 2**16=65536, it is the maximum bytes amount can be written in one operation without blocking, but the capacity of a buffer size varies across systems, it varies even it is on the same system, therefore it's hard to say the side effect when you set the Pipe the maximum amount.

answered Sep 28 '22 21:09

Haifeng Zhang

Related questions
                            
                                Python: Different results when using PyCharm and IDLE/python
                            
                                Include *.pyd files in Python Packages
                            
                                Memory-Mapping Slows Down Over Time, Alternatives?
                            
                                Understanding of @gen.coroutine annotation
                            
                                How can I get comments from a YAML file using ruamel.yaml in Python?
                            
                                What is the reason for _secret_backdoor_key variable in Python HMAC library source code?
                            
                                How to set parameters of the Adadelta Algorithm in Tensorflow correctly?
                            
                                Softmax matrix to 0/1 (OneHot) encoded matrix?
                            
                                Performance if (typeof x == 'number')?
                            
                                pycharm project files have disappeared
                            
                                Identify groups of varying continuous numbers in a list
                            
                                python's `timeit` doesn't always scale linearly with number?
                            
                                Matthews Correlation Coefficient with Keras
                            
                                Choosing between pandas, OOP classes, and dicts (Python)
                            
                                Securely Exposing C# REST API to scripting language such as Python
                            
                                How to real-time filter with scipy and lfilter?
                            
                                append pandas dataframe automatically cast as float but want int
                            
                                Calling a python function with a button
                            
                                Delete elements of an integer recursively
                            
                                Why is it forbidden to override log record attributes?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Changing the Buffer size in multiprocessing.Queue

Tags:

python

python-2.7

python-multiprocessing

phil_20686

People also ask

1 Answers

Haifeng Zhang

Recent Activity

Donate For Us