I am writing a Python program where in the main thread I am continuously (in a loop) receiving data through a TCP socket, using the recv function. In a callback function, I am sending data through the same socket, using the sendall function. What triggers the callback is irrelevant. I've set my socket to blocking.
My question is, is this safe to do? My understanding is that a callback function is called on a separate thread (not the main thread). Is the Python socket object thread-safe? From my research, I've been getting conflicting answers.
You can send and receive on the same socket at the same time (via multiple threads). But the send and receive may not actually occur simultaneously, since one operation may block the other from starting until it's done.
Python doesn't support multi-threading because Python on the Cpython interpreter does not support true multi-core execution via multithreading. However, Python does have a threading library. The GIL does not prevent threading.
Unfortunately,The socket shared by multi-thread is not thread safe. Think about buffer two threads operate on with no lock. The normal way to implement is with two socket,just like what ftp does. cmd socket and msg socket.
Python is NOT a single-threaded language. Python processes typically use a single thread because of the GIL. Despite the GIL, libraries that perform computationally heavy tasks like numpy, scipy and pytorch utilise C-based implementations under the hood, allowing the use of multiple cores.
Sockets in Python are not thread safe.
You're trying to solve a few problems at once:
You may solve these by either using asyncio or solving it the way asyncio solves it internally: By using select.select
together with a socketpair
, and using a queue for the incoming data.
import select
import socket
import queue
# Any data received by this queue will be sent
send_queue = queue.Queue()
# Any data sent to ssock shows up on rsock
rsock, ssock = socket.socketpair()
main_socket = socket.socket()
# Create the connection with main_socket, fill this up with your code
# Your callback thread
def different_thread():
# Put the data to send inside the queue
send_queue.put(data)
# Trigger the main thread by sending data to ssock which goes to rsock
ssock.send(b"\x00")
# Run the callback thread
while True:
# When either main_socket has data or rsock has data, select.select will return
rlist, _, _ = select.select([main_socket, rsock], [], [])
for ready_socket in rlist:
if ready_socket is main_socket:
data = main_socket.recv(1024)
# Do stuff with data, fill this up with your code
else:
# Ready_socket is rsock
rsock.recv(1) # Dump the ready mark
# Send the data.
main_socket.sendall(send_queue.get())
We use multiple constructs in here. You will have to fill up the empty spaces with your code of choice. As for the explanation:
We first create a send_queue
which is a queue of data to send. Then, we create a pair of connected sockets (socketpair()
). We need this later on in order to wake up the main thread as we don't wish recv()
to block and prevent writing to the socket.
Then, we connect the main_socket
and start the callback thread. Now here's the magic:
In the main thread, we use select.select
to know if the rsock
or main_socket
has any data. If one of them has data, the main thread wakes up.
Upon adding data to the queue, we wake up the main thread by signaling ssock
which wakes up rsock
and thus returns from select.select
.
In order to fully understand this, you'll have to read select.select()
, socketpair()
and queue.Queue()
.
@tobias.mcnulty asked a good question in the comments: Why should we use a Queue
instead of sending all the data through the socket?
You can use the socketpair
to send the data as well, which has its benefits, but sending over a queue might be preferable for multiple reasons:
Queue
guarantees we'll have only 1 call - for the single-byte signal - and not more (apart from the queue's internal lock, but that one is pretty cheap). Sending large data through the socketpair
will result in multiple syscalls. As a tip, you may as well use a collections.deque
which CPython guarantees to be thread-safe because of the GIL. That way you won't have to require any syscall besides the socketpair
.socketpair.recv
syscalls, for large data you will pass through multiple select
calls as well. TCP does not have message boundaries. You'll either have to create artificial ones, set the socket to nonblocking and deal with asynchronous sockets, or think of it as a stream and continuously pass through select
calls which might be expensive depending on your OS.All in all, transferring the data back and forth between the kernel and userspace is possible and will work, but I personally do not recommend it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With