I have some questions related to setting the maximum running time of a function in Python. In fact, I would like to use pdfminer to convert the .pdf files to .txt.
The problem is that very often, some files are not possible to decode and take extremely long time. So I want to set threading.Timer() to limit the conversion time for each file to 5 seconds. In addition, I run under windows so I cannot use the signal module for this.
I succeeded in running the conversion code with pdfminer.convert_pdf_to_txt() (in my code it is "c"), but I am not sure that the in the following code, threading.Timer() works. (I don't think it properly constrains the time for each processing)
In summary, I want to:
Convert the pdf to txt
Time limit for each conversion is 5 sec, if it runs out of time, throw an exception and save an empty file
Save all the txt files under the same folder
If there are any exceptions/errors, still save the file but with empty content.
Here is the current code:
import converter as c
import os
import timeit
import time
import threading
import thread
yourpath = 'D:/hh/'
def iftimesout():
print("no")
with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
newfile.write("")
for root, dirs, files in os.walk(yourpath, topdown=False):
for name in files:
try:
timer = threading.Timer(5.0,iftimesout)
timer.start()
t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])
g=str(a.split("\\")[1])
with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
newfile.write(c.convert_pdf_to_txt(os.path.join(root, name)))
print("yes")
timer.cancel()
except KeyboardInterrupt:
raise
except:
for name in files:
t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])
g=str(a.split("\\")[1])
with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
newfile.write("")
I finally figured it out!
First of all, define a function to call another function with a limited timeout:
import multiprocessing
def call_timeout(timeout, func, args=(), kwargs={}):
if type(timeout) not in [int, float] or timeout <= 0.0:
print("Invalid timeout!")
elif not callable(func):
print("{} is not callable!".format(type(func)))
else:
p = multiprocessing.Process(target=func, args=args, kwargs=kwargs)
p.start()
p.join(timeout)
if p.is_alive():
p.terminate()
return False
else:
return True
What does the function do?
p.join()) and allow the function to be executed in this timeAfter the timeout expires, check if the function is still running
FalseTrueWe can test it with time.sleep():
import time
finished = call_timeout(2, time.sleep, args=(1, ))
if finished:
print("No timeout")
else:
print("Timeout")
We run a function which needs one second to finish, timeout is set to two seconds:
No timeout
If we run time.sleep(10) and set the timeout to two seconds:
finished = call_timeout(2, time.sleep, args=(10, ))
Result:
Timeout
Notice the program stops after two seconds without finishing the called function.
Your final code will look like this:
import converter as c
import os
import timeit
import time
import multiprocessing
yourpath = 'D:/hh/'
def call_timeout(timeout, func, args=(), kwargs={}):
if type(timeout) not in [int, float] or timeout <= 0.0:
print("Invalid timeout!")
elif not callable(func):
print("{} is not callable!".format(type(func)))
else:
p = multiprocessing.Process(target=func, args=args, kwargs=kwargs)
p.start()
p.join(timeout)
if p.is_alive():
p.terminate()
return False
else:
return True
def convert(root, name, g, t):
with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
newfile.write(c.convert_pdf_to_txt(os.path.join(root, name)))
for root, dirs, files in os.walk(yourpath, topdown=False):
for name in files:
try:
t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])
g=str(a.split("\\")[1])
finished = call_timeout(5, convert, args=(root, name, g, t))
if finished:
print("yes")
else:
print("no")
with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
newfile.write("")
except KeyboardInterrupt:
raise
except:
for name in files:
t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])
g=str(a.split("\\")[1])
with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
newfile.write("")
The code should be easy to understand, if not, feel free to ask.
I really hope this helps (as it took some time for us to get it right ;))!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With