python non blocking write csv file

Question

I am writing some python code to do some calculation and write the result to file. Here is my current code:

for name, group in data.groupby('Date'):
    df = lot_of_numpy_calculations(group)

    with open('result.csv', 'a') as f:
        df.to_csv(f, header=False, index=False)

both calculation and write take sometimes. I read some article about async in python, but I didn't know how to implement it. Is there a easy way to optimize this loop so that it doesn't wait until the writing finish and start the next iteration?

user4815162342 · Accepted Answer

Since neither numpy nor pandas io are asyncio aware, this might be a better use case for threads than for asyncio. (Also, asyncio based solutions will use threads behind the scenes anyway.)

For example, this code spawns a writer thread to which you submit work using a queue:

import threading, queue

to_write = queue.Queue()

def writer():
    # Call to_write.get() until it returns None
    for df in iter(to_write.get, None):
        with open('result.csv', 'a') as f:
            df.to_csv(f, header=False, index=False)
threading.Thread(target=writer).start()

for name, group in data.groupby('Date'):
    df = lot_of_numpy_calculations(group)
    to_write.put(df)
# enqueue None to instruct the writer thread to exit
to_write.put(None)

Note that, if writing turns out to be consistently slower than the calculation, the queue will keep accumulating data frames, which might end up consuming a lot of memory. In that case be sure to provide a maximum size for the queue by passing the maxsize argument to the constructor.

Also, consider that re-opening the file for each write can slow down writing. If the amount of data written is small, perhaps you could get better performance by opening the file beforehand.

Mikhail Gerasimov · Answer

Since most operating systems don't support asynchronous file I/O, common cross-platform approach now is to use threads.

For example, aiofiles modules wraps thread pool to provide file I/O API for asyncio.

python non blocking write csv file

Tags:

python

python-multithreading

python-asyncio

JOHN

2 Answers

user4815162342

Mikhail Gerasimov

Recent Activity

Donate For Us

python non blocking write csv file

Tags:

python

python-multithreading

python-asyncio

JOHN

2 Answers

user4815162342

Mikhail Gerasimov

Related questions

Recent Activity

Donate For Us