Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to decorate a Python object with a mutex

I'm new to python and am currently trying to learn threading. I'm weary of using locks to make my resources thread-safe because they aren't inherently tied to the resource, so I'm bound to forget to acquire and/or release them every time my code interacts with the resource. Instead, I'd like to be able to "wrap" (or decorate?) an object so that all of it's methods and attribute getters/setters are atomic. something like this:

state = atomicObject(dict())

# the following is atomic/thread-safe
state["some key"] = "some value"

Is this possible? If so, what's the "best practices" way of implementing it?

EDIT: An good answer to the above question is available in How to make built-in containers (sets, dicts, lists) thread safe?. However; as abarnert and jsbueno have both demonstrated, the solution I proposed (automating locks) is not generally a good idea because determining the proper granularity of atomic operations requires some intelligence and is likely difficult (or impossible) to automate properly.

The problem still remains that locks are not bound in any way to the resources they are meant to protect, so my new question is: What's a good way to associate a lock with an object?

Proposed solution #2: I imagine there might be a way to bind a lock to an object such that trying to access that object without first acquiring the lock throws an error, but I can see how that could get tricky.

EDIT: The following code is not very relevant to the question. I posted it to demonstrate that I had tried to solve the problem myself and gotten lost before posting this question.

For the record, I wrote the following code, but it doesn't work:

import threading    
import types
import inspect

class atomicObject(object):

    def __init__(self, obj):
        self.lock = threading.RLock()
        self.obj = obj

        # keep track of function handles for lambda functions that will be created
        self.funcs = []

        # loop through all the attributes of the passed in object
        # and create wrapped versions of each attribute
        for name in dir(self.obj):
            value = getattr(self.obj, name)
            if inspect.ismethod(value):
                # this is where things get really ugly as i try to work around the
                # limitations of lambda functions and use eval()... I'm not proud of this code
                eval("self.funcs.append(lambda self, *args, **kwargs: self.obj." + name + "(*args, **kwargs))")
                fidx = str(len(self.funcs) - 1)
                eval("self." + name + " = types.MethodType(lambda self, *args, **kwargs: self.atomize(" + fidx + ", *args, **kwargs), self)")

    def atomize(self, fidx, *args, **kwargs):
        with self.lock:
            return self.functions[fidx](*args, **kwargs)

I can create an atomicObject(dict()), but when I try to add a value to the object, I get the error; "atomicObject does not support item assignment".

like image 858
arachnivore Avatar asked Apr 11 '13 23:04

arachnivore


Video Answer


2 Answers

It's very hard to tell from your non-running example and your mess of eval code, but there's at least one obvious error.

Try this in your interactive interpreter:

>>> d = dict()
>>> inspect.ismethod(d.__setitem__)

As the docs say, ismethod:

Return true if the object is a bound method written in Python.

A method-wrapper written in C (or .NET, Java, the next workspace down, etc. for other Python implementations) is not a bound method written in Python.

You probably just wanted callable or inspect.isroutine here.

I can't say whether this is the only problem, because if I fix the syntax errors and name errors and this bug, the second eval line generates illegal code like this:

self.__cmp__ = types.MethodType(lambda self, *args, **kwargs: self.atomize(0, *args, **kwargs) self)

… and I'm not sure what you were trying to do there.


You really shouldn't be trying to create and eval anything. To assign attributes dynamically by name, use setattr. And you don't need complicated lambdas. Just define the wrapped function with a normal def; the result is a perfectly good local value that you can pass around, exactly like a lambda except that it has a name.

On top of that, trying to wrap methods statically at creation time is difficult, and has some major downsides. (For example, if the class you're wrapping has any dynamically-generated methods, you won't wrap them.) Most of the time, you're better off doing it dynamically, at call time, with __getattr__. (If you're worried about the cost of creating the wrapper functions every time they're called… First, don't worry unless you actually profile and find that it's a bottleneck, because it probably won't be. But, if it is, you can easily add a cache of generated functions.)

So, here's a much simpler, and working, implementation of what I think you're trying to do:

class atomicObject(object):

    def __init__(self, obj):
        self.lock = threading.Lock()
        self.obj = obj

    def __getattr__(self, name):
        attr = getattr(self.obj, name)
        print(attr)
        if callable(attr):
            def atomized(*args, **kwargs):
                with self.lock:
                    attr(*args, **kwargs)
            return atomized
        return attr

However, this isn't going to actually do what you want. For example:

>>> d = atomicObject(dict())
>>> d.update({'a': 4}) # works
>>> d['b'] = 5
TypeError: 'atomicObject' object does not support item assignment

Why does this happen? You've got a __setitem__, and it works:

>>> d.__setitem__
<method-wrapper '__setitem__' of dict object at 0x100706830>
>>> d.__setitem__('b', 5) # works

The problem is that, as the docs imply, special methods are looked up on the class, not the object. And the atomicObject class doesn't have a __setitem__ method.

In fact, this means you can't even usefully print out your object, because you just get the default __str__ and __repr__ from object:

>>> d
<__main__.atomicObject object at 0x100714690>
>>> print(d)
<__main__.atomicObject object at 0x100714690>
>>> d.obj #cheating
{'a': 4, 'b': 5}

So, the right thing to do here is to write a function that defines a wrapper class for any class, then do:

>>> AtomicDict = make_atomic_wrapper(dict)
>>> d = AtomicDict()

But, even after you do all of that… this is rarely as good an idea as it sounds.

Consider this:

d = AtomicDict()
d['abc'] = 0
d['abc'] += 1

That last line is not atomic. There's an atomic __getitem__, then a separate atomic __setitem__.

That may not sound like a big deal, but imagine that d is being used as a counter. You've got 20 threads all trying to do d['abc'] += 1 at the same time. The first one to get in on the __getitem__ will get back 0. And if it's the last one to get in on the __setitem__, it'll set it to 1.

Try running this example. With proper locking, it should always print out 2000. But on my laptop, it's usually closer to 125.

like image 167
abarnert Avatar answered Oct 21 '22 05:10

abarnert


I had put some thought on your question, and it would be something a tricky - you have to proxy not only all the object methods with your Atomic class, - which can be done properly writting a __getattribute__ method - but for the operators themselves to work, you'd also have to provide the proxied object with a class that provides the same "magic double underscore" methods as the original's objects class - that is, you have to dynamically create a proxied class - else the operator usage themselves won't be atomic.

It is doable - but since you are new in Python, you can perform import this on the interactive prompt, and among the several guidelines/advices that show up you will see: """If the implementation is hard to explain, it's a bad idea.""" :-)

Which brings us to: Using threads in Python is generally a bad idea. Except for quasi-trivial code with lots of blocking I/O - you will prefer another approach -as threading in Python does not allow ordinary Python code to make use of more CPU cores, for example - there is only a single thread of Python code running at once - search for "Python GIL" to learn why - (exception, if a lot of your code is spend in computational intensive native code, such as Numpy functions) .

But you'd rather write you r program to use asynchronous calls using one of the various available frameworks for that, or for easily taking advantage of more than one core, use multiprocessing instead of threading - which basically creates one process per "thread" - and requires all that sharing to be done explicitly.

like image 33
jsbueno Avatar answered Oct 21 '22 03:10

jsbueno