Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python hangs for too long on with open

I am parsing JavaScript source files in python 3.5. A loop checks out all commits from a github repository, and the script loops through all changed files. When a file is changed in two subsequent checkouts (which means, it changed in the commit), the script can hang on the with(open...) line for seconds, even for moderate (~5-8 MB) file sizes. I have created an example script, that imitates the problem:

test_data = "./sample.js"
for _ in range(10):
    start1 = time.time()
    with open(file=test_data, mode="rb", buffering=1) as f:
        end1 = time.time()
        start2 = time.time()
        line_content = f.readlines()
        ## Do some processing
    end2 = time.time()
    print("Processing file {} is done.".format(test_data))
    print("Time spent on open is {0:10f}.".format(end1 - start1))
    print("Time reading is {0:10f}.".format(end2 - start2))
    with open(test_data, mode="a", encoding="utf-8") as fw:
        fw.write("test")

The sample.js file is around 7 MB. Here is the output:

Processing file ./sample.js is done.
Time spent on open is   0.000000.
Time reading is   0.017001.
Processing file ./sample.js is done.
Time spent on open is   1.683999.
Time reading is   0.013999.
Processing file ./sample.js is done.
Time spent on open is   1.651003.
Time reading is   0.012030.
Processing file ./sample.js is done.
Time spent on open is   1.638999.
Time reading is   0.014997.
Processing file ./sample.js is done.
Time spent on open is   2.282346.
Time reading is   0.013001.
Processing file ./sample.js is done.
Time spent on open is   1.701004.
Time reading is   0.011998.
Processing file ./sample.js is done.
Time spent on open is   1.689004.
Time reading is   0.012995.
Processing file ./sample.js is done.
Time spent on open is   1.707036.
Time reading is   0.012959.
Processing file ./sample.js is done.
Time spent on open is   1.701031.
Time reading is   0.012969.
Processing file ./sample.js is done.
Time spent on open is   1.653999.
Time reading is   0.019003.

I have tried to use Process from multiprocessing, calling the garbage collector and also ExitStack from contextlib, but nothing helped.

Any ide what could cause this behaviour?

EDIT: Seems like the problem is Windows specific (at least, it wasn't as significant on Linux and MacOS).

like image 908
rabxly Avatar asked Nov 23 '19 14:11

rabxly


1 Answers

Your OS is the culprit!!

This is why in the multiprocessing documentation they added a specific paragraph for Windows in the Programming Guidelines. I highly recommend to read the Programming Guidelines as they already include all the required information to write portable multi-processing code.

like image 148
classicdude7 Avatar answered Nov 07 '22 10:11

classicdude7