Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is filter thread-safe

Tags:

python

I have a thread which is updating a list called l. Am I right in saying that it is thread-safe to do the following from another thread?

filter(lambda x: x[0] == "in", l)

If its not thread safe, is this then the correct approach:

import threading
import time
import Queue

class Logger(threading.Thread):
    def __init__(self, log):
        super(Logger, self).__init__()
        self.log = log
        self.data = []
        self.finished = False
        self.data_lock = threading.Lock()

    def run(self):
        while not self.finished:
            try:
                with self.data_lock: 
                    self.data.append(self.log.get(block=True, timeout=0.1))
            except Queue.Empty:
                pass

    def get_data(self, cond):
        with self.data_lock: 
            d = filter(cond, self.data)      
        return d 

    def stop(self):
        self.finished = True
        self.join()  
        print("Logger stopped")

where the get_data(self, cond) method is used to retrieve a small subset of the data in the self.data in a thread safe manner.

like image 750
Baz Avatar asked Jun 03 '15 12:06

Baz


People also ask

Are Spring Filters thread-safe?

In Spring, by default, objects managed by the Dependency Injection container are singletons, and need to be thread-safe. If this bothers you, then at least you have the choice in Spring of changing the "scope" in which objects are used. In the case of servlet filters, a framework is not usually used when defining them.

Is Httpsession thread-safe?

The session is not thread safe and neither the get not the set methods are guaranteed to be thread safe. In general in a servlet container you should assume to be in a multi threaded environment and no provided tooling is safe. This also goes for the objects you store in the session.

How can you make a servlet thread-safe?

To make a servlet or a block within a servlet thread-safe, do one of the following: Synchronize write access to all instance variables, as in public synchronized void method() (whole method) or synchronized(this) {...} (block only).

Are Java variables thread-safe?

Using Final keywordFinal Variables are also thread-safe in java because once assigned some reference of an object It cannot point to reference of another object.


1 Answers

First, to answer your question in the title: filter is just a function. Hence, its thread-safety will rely on the data-structure you use it with.

As pointed out in the comments already, list operations themselves are thread-safe in CPython and protected by the GIL, but that is arguably only an implementation detail of CPython that you shouldn't really rely on. Even if you could rely on it, thread safety of some of their operations probably does not mean the kind of thread safety you mean:

The problem is that iterating over a sequence with filter is in general not an atomic operation. The sequence could be changed during iteration. Depending on the data-structure underlying your iterator this might cause more or less weird effects. One way to overcome this problem is by iterating over a copy of the sequence that is created with one atomic action. Easiest way to do this for standard sequences like tuple, list, string is with the slice operator like this:

filter(lambda x: x[0] == "in", l[:])

Apart from this not necessarily being thread-safe for other data-types, there's one problem with this though: it's only a shallow copy. As your list's elements seem to be list-like as well, another thread could in parallel do del l[1000][:] to empty one of the inner lists (which are pointed to in your shallow copy as well). This would make your filter expression fail with an IndexError.

All that said, it's not a shame to use a lock to protect access to your list and I'd definitely recommend it. Depending on how your data changes and how you work with the returned data, it might even be wise to deep-copy the elements while holding the lock and to return those copies. That way you can guarantee that once returned the filter condition won't suddenly change for the returned elements.

Wrt. your Logger code: I'm not 100 % sure how you plan to use this and if it's critical for you to run several threads on one queue and join them. What looks weird to me is that you never use Queue.task_done() (assuming that its self.log is a Queue). Also your polling of the queue is potentially wasteful. If you don't need the join of the thread, I'd suggest to at least turn the lock acquisition around:

class Logger(threading.Thread):
    def __init__(self, log):
        super(Logger, self).__init__()
        self.daemon = True
        self.log = log
        self.data = []
        self.data_lock = threading.Lock()

    def run(self):
        while True:
            l = self.log.get()  # thread will sleep here indefinitely
            with self.data_lock: 
                self.data.append(l)
            self.log.task_done()

    def get_data(self, cond):
        with self.data_lock: 
            d = filter(cond, self.data)
            # maybe deepcopy d here
        return d

Externally you could still do log.join() to make sure that all of the elements of the log queue are processed.

like image 51
Jörn Hees Avatar answered Oct 12 '22 23:10

Jörn Hees