I have a thread which is updating a list called l
. Am I right in saying that it is thread-safe to do the following from another thread?
filter(lambda x: x[0] == "in", l)
If its not thread safe, is this then the correct approach:
import threading
import time
import Queue
class Logger(threading.Thread):
def __init__(self, log):
super(Logger, self).__init__()
self.log = log
self.data = []
self.finished = False
self.data_lock = threading.Lock()
def run(self):
while not self.finished:
try:
with self.data_lock:
self.data.append(self.log.get(block=True, timeout=0.1))
except Queue.Empty:
pass
def get_data(self, cond):
with self.data_lock:
d = filter(cond, self.data)
return d
def stop(self):
self.finished = True
self.join()
print("Logger stopped")
where the get_data(self, cond)
method is used to retrieve a small subset of the data in the self.data in a thread safe manner.
In Spring, by default, objects managed by the Dependency Injection container are singletons, and need to be thread-safe. If this bothers you, then at least you have the choice in Spring of changing the "scope" in which objects are used. In the case of servlet filters, a framework is not usually used when defining them.
The session is not thread safe and neither the get not the set methods are guaranteed to be thread safe. In general in a servlet container you should assume to be in a multi threaded environment and no provided tooling is safe. This also goes for the objects you store in the session.
To make a servlet or a block within a servlet thread-safe, do one of the following: Synchronize write access to all instance variables, as in public synchronized void method() (whole method) or synchronized(this) {...} (block only).
Using Final keywordFinal Variables are also thread-safe in java because once assigned some reference of an object It cannot point to reference of another object.
First, to answer your question in the title: filter
is just a function. Hence, its thread-safety will rely on the data-structure you use it with.
As pointed out in the comments already, list operations themselves are thread-safe in CPython and protected by the GIL, but that is arguably only an implementation detail of CPython that you shouldn't really rely on. Even if you could rely on it, thread safety of some of their operations probably does not mean the kind of thread safety you mean:
The problem is that iterating over a sequence with filter
is in general not an atomic operation. The sequence could be changed during iteration. Depending on the data-structure underlying your iterator this might cause more or less weird effects. One way to overcome this problem is by iterating over a copy of the sequence that is created with one atomic action. Easiest way to do this for standard sequences like tuple
, list
, string
is with the slice operator like this:
filter(lambda x: x[0] == "in", l[:])
Apart from this not necessarily being thread-safe for other data-types, there's one problem with this though: it's only a shallow copy. As your list's elements seem to be list-like as well, another thread could in parallel do del l[1000][:]
to empty one of the inner lists (which are pointed to in your shallow copy as well). This would make your filter expression fail with an IndexError
.
All that said, it's not a shame to use a lock to protect access to your list and I'd definitely recommend it. Depending on how your data changes and how you work with the returned data, it might even be wise to deep-copy the elements while holding the lock and to return those copies. That way you can guarantee that once returned the filter condition won't suddenly change for the returned elements.
Wrt. your Logger
code: I'm not 100 % sure how you plan to use this and if it's critical for you to run several threads on one queue and join
them. What looks weird to me is that you never use Queue.task_done()
(assuming that its self.log
is a Queue
). Also your polling of the queue is potentially wasteful. If you don't need the join
of the thread, I'd suggest to at least turn the lock acquisition around:
class Logger(threading.Thread):
def __init__(self, log):
super(Logger, self).__init__()
self.daemon = True
self.log = log
self.data = []
self.data_lock = threading.Lock()
def run(self):
while True:
l = self.log.get() # thread will sleep here indefinitely
with self.data_lock:
self.data.append(l)
self.log.task_done()
def get_data(self, cond):
with self.data_lock:
d = filter(cond, self.data)
# maybe deepcopy d here
return d
Externally you could still do log.join()
to make sure that all of the elements of the log
queue are processed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With