Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Twisted threading howto avoid deepcopy

I have a twisted server which does some "long" task for each request so i defer to thread each call. In each request i access a common resource, which gets altered during the process. Each request should start with the original data so i use deepcopy on the common resource (while invoking a lock acquire). It works, BUT i think it's not fast enough. I have the feeling that deepcopy is slowing things a bit.

What suggestions do you have when dealing in a threaded twisted server with resources mutation ?

like image 276
Catalin Avatar asked Aug 16 '11 18:08

Catalin


2 Answers

Try operating with the minimum data possible in your worker threads. Pass all data that they need in as arguments and take all of their output as the return value (the value the Deferred fires with) rather than as mutations to the inputs.

Then integrate the results into the common data structure in the reactor thread.

This lets you reason about the work in isolation and avoid any additional locking (which results in contention, slowing things down in addition to making them more confusing).

like image 97
Jean-Paul Calderone Avatar answered Nov 20 '22 00:11

Jean-Paul Calderone


If you like you could just synchronize access to the shared resource with threading.Lock just like you would in any other threaded program rather than copying it.

Regardless, I think it's worth benchmarking your code with and without the deepcopy and otherwise measuring to figure out how good/bad the performance really is before making optimizations. Perhaps the reason it is slow has nothing to do with deepcopy.

EDIT regarding using locking: What I mean is that you can use more fine grained locking around this resource. I assume that your threads are doing more than accessing a shared resource. You can try to benefit from multiple threads doing work and then synchronize access to just the one "critical section" that involves writing to the shared resource. You might also investigate making your shared resource threadsafe. For example, if have a shared object, SillyExampleFriendsList:

class SillyExampleFriendsList(object):
    """Just manipulates a couple lists"""
    def __init__(self):
       self._lock = threading.RLock()
       self._friends = []
       self._enemies = []

    def unfriend(self, x):
       # we lock here to ensure that we're never in a state where
       # someone might think 'x' is both our friend and our enemy.
       self._lock.acquire()
       self._friends.remove(x)
       self._enemies.append(x)
       self._lock.release()

The point here is just that the above object could potentially be shared between multiple threads without deepcopy by careful use of locks. It's not trivial to identify all the cases where this might be necessary and fine grained locking strategies can be more difficult to debug and still introduce overhead.

That said, you may not need threads, locks, or deepcopy at all and without benchmarking your code it's not clear if you have a performance problem that needs to be solved. I'm curious what makes you think that your code should be, or needs to be, faster?

like image 2
stderr Avatar answered Nov 19 '22 22:11

stderr