Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Socket Receive/Send Multi-threading

I am writing a Python program where in the main thread I am continuously (in a loop) receiving data through a TCP socket, using the recv function. In a callback function, I am sending data through the same socket, using the sendall function. What triggers the callback is irrelevant. I've set my socket to blocking.

My question is, is this safe to do? My understanding is that a callback function is called on a separate thread (not the main thread). Is the Python socket object thread-safe? From my research, I've been getting conflicting answers.

like image 517
Alon Avatar asked Jun 29 '18 14:06

Alon


People also ask

Can a socket send and receive at the same time Python?

You can send and receive on the same socket at the same time (via multiple threads). But the send and receive may not actually occur simultaneously, since one operation may block the other from starting until it's done.

Can Python handle multithreading?

Python doesn't support multi-threading because Python on the Cpython interpreter does not support true multi-core execution via multithreading. However, Python does have a threading library. The GIL does not prevent threading.

Are Python sockets thread safe?

Unfortunately,The socket shared by multi-thread is not thread safe. Think about buffer two threads operate on with no lock. The normal way to implement is with two socket,just like what ftp does. cmd socket and msg socket.

Is Python single threaded or multi threaded?

Python is NOT a single-threaded language. Python processes typically use a single thread because of the GIL. Despite the GIL, libraries that perform computationally heavy tasks like numpy, scipy and pytorch utilise C-based implementations under the hood, allowing the use of multiple cores.


1 Answers

Sockets in Python are not thread safe.

You're trying to solve a few problems at once:

  1. Sockets are not thread-safe.
  2. recv is blocking and blocks the main thread.
  3. sendall is being used from a different thread.

You may solve these by either using asyncio or solving it the way asyncio solves it internally: By using select.select together with a socketpair, and using a queue for the incoming data.

import select
import socket
import queue

# Any data received by this queue will be sent
send_queue = queue.Queue()

# Any data sent to ssock shows up on rsock
rsock, ssock = socket.socketpair()

main_socket = socket.socket()

# Create the connection with main_socket, fill this up with your code

# Your callback thread
def different_thread():
    # Put the data to send inside the queue
    send_queue.put(data)

    # Trigger the main thread by sending data to ssock which goes to rsock
    ssock.send(b"\x00")

# Run the callback thread

while True:
    # When either main_socket has data or rsock has data, select.select will return
    rlist, _, _ = select.select([main_socket, rsock], [], [])
    for ready_socket in rlist:
        if ready_socket is main_socket:
            data = main_socket.recv(1024)
            # Do stuff with data, fill this up with your code
        else:
            # Ready_socket is rsock
            rsock.recv(1)  # Dump the ready mark
            # Send the data.
            main_socket.sendall(send_queue.get())

We use multiple constructs in here. You will have to fill up the empty spaces with your code of choice. As for the explanation:

We first create a send_queue which is a queue of data to send. Then, we create a pair of connected sockets (socketpair()). We need this later on in order to wake up the main thread as we don't wish recv() to block and prevent writing to the socket.

Then, we connect the main_socket and start the callback thread. Now here's the magic:

In the main thread, we use select.select to know if the rsock or main_socket has any data. If one of them has data, the main thread wakes up.

Upon adding data to the queue, we wake up the main thread by signaling ssock which wakes up rsock and thus returns from select.select.

In order to fully understand this, you'll have to read select.select(), socketpair() and queue.Queue().


@tobias.mcnulty asked a good question in the comments: Why should we use a Queue instead of sending all the data through the socket?

You can use the socketpair to send the data as well, which has its benefits, but sending over a queue might be preferable for multiple reasons:

  1. Sending data over a socket is an expensive operation. It requires a syscall, requires passing data back and forth inside system buffers, and entails full use of the TCP stack. Using a Queue guarantees we'll have only 1 call - for the single-byte signal - and not more (apart from the queue's internal lock, but that one is pretty cheap). Sending large data through the socketpair will result in multiple syscalls. As a tip, you may as well use a collections.deque which CPython guarantees to be thread-safe because of the GIL. That way you won't have to require any syscall besides the socketpair.
  2. Architecture-wise, using a queue allows you to have finer-grained control later on. For example, the data can be sent in whichever type you wish and be decoded afterwards. This allows the main loop to be a little smarter and can help you create an easier interface.
  3. You don't have size limits. It can be a bug or a feature. I believe changing the system's buffer size is not exactly encouraged, which creates a natural throttle to the amount of data you can send. It might be a benefit, but the application may wish to control it on its own. Using the "natural" feature will cause the calling thread to hang.
  4. Just like socketpair.recv syscalls, for large data you will pass through multiple select calls as well. TCP does not have message boundaries. You'll either have to create artificial ones, set the socket to nonblocking and deal with asynchronous sockets, or think of it as a stream and continuously pass through select calls which might be expensive depending on your OS.
  5. Support for multiple threads on the same socketpair. Sending 1 byte for signalling over a socket from multiple threads is fine, and is exactly how asyncio works. Sending more than that may cause the data to be sent in an incorrect order.

All in all, transferring the data back and forth between the kernel and userspace is possible and will work, but I personally do not recommend it.

like image 166
Bharel Avatar answered Sep 22 '22 03:09

Bharel