Multiple threads writing to the same CSV in Python

Tags:

I'm new to multi-threading in Python and am currently writing a script that appends to a csv file. If I was to have multiple threads submitted to an concurrent.futures.ThreadPoolExecutor that appends lines to a csv file. What could I do to guarantee thread safety if appending was the only file-related operation being done by these threads?

Simplified version of my code:

with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
    for count,ad_id in enumerate(advertisers):

        downloadFutures.append(executor.submit(downloadThread, arguments.....))
        time.sleep(random.randint(1,3))

And my thread class being:

def downloadThread(arguments......):

                #Some code.....

                writer.writerow(re.split(',', line.decode()))

Should I set up a seperate single-threaded executor to handle writing or is it woth worrying about if I am just appending?

EDIT: I should elaborate that when the write operations occur can vary greatly with minutes between when the file is next appended to, I am just concerned that this scenario has not occurred when testing my script and I would prefer to be covered for that.

403

asked Oct 13 '15 15:10

GreenGodot

2 Answers

I am not sure if csvwriter is thread-safe. The documentation doesn't specify, so to be safe, if multiple threads use the same object, you should protect the usage with a threading.Lock:

# create the lock
import threading
csv_writer_lock = threading.Lock()

def downloadThread(arguments......):
    # pass csv_writer_lock somehow
    # Note: use csv_writer_lock on *any* access
    # Some code.....
    with csv_writer_lock:
        writer.writerow(re.split(',', line.decode()))

That being said, it may indeed be more elegant for the downloadThread to submit write tasks to an executor, instead of explicitly using locks like this.

191

answered Oct 09 '22 19:10

Claudiu

Way-late-to-the-party note: You could handle this a different way with no locking by having a single writer consuming from a shared Queue, with rows being pushed to the Queue by the threads doing the processing.

from threading import Thread
from queue import Queue
from concurrent.futures import ThreadPoolExecutor


# CSV writer setup goes here

queue = Queue()


def consume():
    while True:
        if not queue.empty():
            i = queue.get()
            
            # Row comes out of queue; CSV writing goes here
            
            print(i)
            if i == 4999:
                return


consumer = Thread(target=consume)
consumer.setDaemon(True)
consumer.start()


def produce(i):
    # Data processing goes here; row goes into queue
    queue.put(i)


with ThreadPoolExecutor(max_workers=10) as executor:
    for i in range(5000):
        executor.submit(produce, i)

consumer.join()

answered Oct 09 '22 20:10

kungphu

Related questions
                            
                                Python inequalities: != vs not ==
                            
                                How to apply function to elements of a list?
                            
                                How to truncate all strings in a list to a same length, in some pythonic way?
                            
                                Adding an attribute to a Python dictionary from the standard library
                            
                                Format number using LaTeX notation in Python
                            
                                Pandas installation on Mac OS X: ImportError (cannot import name hashtable)
                            
                                How to send a “multipart/related” with requests in python?
                            
                                Argv - String into Integer
                            
                                Passing image object as a button background in Kivy
                            
                                how to get dict of model objects keyed by field
                            
                                Python BeautifulSoup findAll by "class" attribute
                            
                                SqlAlchemy update not working with Sqlite
                            
                                Python sklearn - how to calculate p-values
                            
                                How to enable python repl autocomplete and still allow new line tabs
                            
                                How to store a Python dictionary as an Environment Variable
                            
                                How to return data with 403 error in Django Rest Framework?
                            
                                subprocess call ffmpeg (command line)
                            
                                Where is Qt designer app on Mac + Anaconda?
                            
                                Count how many times each row is present in numpy.array
                            
                                How to get one number specific times in an array python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Multiple threads writing to the same CSV in Python

Tags:

python

multithreading

csv

executor

GreenGodot

People also ask

2 Answers

Claudiu

kungphu

Recent Activity

Donate For Us