Can Pandas write to the same CSV file concurrently?

Question

I mistakenly had two scripts running at the same time that wrote a pandas dataframe in chunks to the same CSV file. Since the CSV file was supposed to be appended, the script itself doesn't block writing to the CSV file if it already exists. I didn't catch it until it was too late.

Kinda like this:

script1.py

for i, chunk in enumerate(datachunks):
       do something
       result_df.to_csv('csvfile.csv') (in write mode for the 1st chunk, append mode for the next chunks)

script2.py

for i, chunk in enumerate(datachunks2):
       do something
       result_df.to_csv('csvfile.csv') (in write mode for the 1st chunk, append mode for the next chunks)
       # should have been csvfile2.csv

Each script takes around 12 hours to execute due to the sheer volume of data that has to be processed, and I think it's faster to separate the CSV file into two so that I get the outputs that each script should have given. This should work -- unless I have unintended duplicates in the file, or even lines that didn't write.

Both scripts finished without any errors, if that's relevant.

Any chance of duplicates/missing data in this csvfile.csv?

irene · Accepted Answer

I decided to just rerun the scripts and compare the outputs. Seems it's not promising -- I lost a lot of rows.

Can Pandas write to the same CSV file concurrently?

Tags:

python

pandas

irene

1 Answers

irene

Recent Activity

Donate For Us

Can Pandas write to the same CSV file concurrently?

Tags:

python

pandas

irene

1 Answers

irene

Related questions

Recent Activity

Donate For Us