Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Break the function after certain time

Tags:

In Python, for a toy example:

for x in range(0, 3):     # Call function A(x) 

I want to continue the for loop if function A takes more than five seconds by skipping it so I won't get stuck or waste time.

By doing some search, I realized a subprocess or thread may help, but I have no idea how to implement it here.

like image 292
user2372074 Avatar asked Jul 30 '14 00:07

user2372074


People also ask

How do I stop a Python code after a certain amount of time?

Python sleep() is a function used to delay the execution of code for the number of seconds given as input to sleep(). The sleep() command is a part of the time module. You can use the sleep() function to temporarily halt the execution of your code.

How do you end a function after some time Python?

terminate() function will terminate foo function. p. join() is used to continue execution of main thread. If you run the above script, it will run for 10 seconds and terminate after that.

How do you break out a function?

Using return is the easiest way to exit a function. You can use return by itself or even return a value.


2 Answers

I think creating a new process may be overkill. If you're on Mac or a Unix-based system, you should be able to use signal.SIGALRM to forcibly time out functions that take too long. This will work on functions that are idling for network or other issues that you absolutely can't handle by modifying your function. I have an example of using it in this answer:

Option for SSH to timeout after a short time? ClientAlive & ConnectTimeout don't seem to do what I need them to do

Editing my answer in here, though I'm not sure I'm supposed to do that:

import signal  class TimeoutException(Exception):   # Custom exception class     pass  def timeout_handler(signum, frame):   # Custom signal handler     raise TimeoutException  # Change the behavior of SIGALRM signal.signal(signal.SIGALRM, timeout_handler)  for i in range(3):     # Start the timer. Once 5 seconds are over, a SIGALRM signal is sent.     signal.alarm(5)         # This try/except loop ensures that      #   you'll catch TimeoutException when it's sent.     try:         A(i) # Whatever your function that might hang     except TimeoutException:         continue # continue the for loop if function A takes more than 5 second     else:         # Reset the alarm         signal.alarm(0) 

This basically sets a timer for 5 seconds, then tries to execute your code. If it fails to complete before time runs out, a SIGALRM is sent, which we catch and turn into a TimeoutException. That forces you to the except block, where your program can continue.

like image 114
TheSoundDefense Avatar answered Sep 23 '22 12:09

TheSoundDefense


If you can break your work up and check every so often, that's almost always the best solution. But sometimes that's not possible—e.g., maybe you're reading a file off an slow file share that every once in a while just hangs for 30 seconds. To deal with that internally, you'd have to restructure your whole program around an async I/O loop.

If you don't need to be cross-platform, you can use signals on *nix (including Mac and Linux), APCs on Windows, etc. But if you need to be cross-platform, that doesn't work.

So, if you really need to do it concurrently, you can, and sometimes you have to. In that case, you probably want to use a process for this, not a thread. You can't really kill a thread safely, but you can kill a process, and it can be as safe as you want it to be. Also, if the thread is taking 5+ seconds because it's CPU-bound, you don't want to fight with it over the GIL.

There are two basic options here.


First, you can put the code in another script and run it with subprocess:

subprocess.check_call([sys.executable, 'other_script.py', arg, other_arg],                       timeout=5) 

Since this is going through normal child-process channels, the only communication you can use is some argv strings, a success/failure return value (actually a small integer, but that's not much better), and optionally a hunk of text going in and a chunk of text coming out.


Alternatively, you can use multiprocessing to spawn a thread-like child process:

p = multiprocessing.Process(func, args) p.start() p.join(5) if p.is_alive():     p.terminate() 

As you can see, this is a little more complicated, but it's better in a few ways:

  • You can pass arbitrary Python objects (at least anything that can be pickled) rather than just strings.
  • Instead of having to put the target code in a completely independent script, you can leave it as a function in the same script.
  • It's more flexible—e.g., if you later need to, say, pass progress updates, it's very easy to add a queue in either or both directions.

The big problem with any kind of parallelism is sharing mutable data—e.g., having a background task update a global dictionary as part of its work (which your comments say you're trying to do). With threads, you can sort of get away with it, but race conditions can lead to corrupted data, so you have to be very careful with locking. With child processes, you can't get away with it at all. (Yes, you can use shared memory, as Sharing state between processes explains, but this is limited to simple types like numbers, fixed arrays, and types you know how to define as C structures, and it just gets you back to the same problems as threads.)


Ideally, you arrange things so you don't need to share any data while the process is running—you pass in a dict as a parameter and get a dict back as a result. This is usually pretty easy to arrange when you have a previously-synchronous function that you want to put in the background.

But what if, say, a partial result is better than no result? In that case, the simplest solution is to pass the results over a queue. You can do this with an explicit queue, as explained in Exchanging objects between processes, but there's an easier way.

If you can break the monolithic process into separate tasks, one for each value (or group of values) you wanted to stick in the dictionary, you can schedule them on a Pool—or, even better, a concurrent.futures.Executor. (If you're on Python 2.x or 3.1, see the backport futures on PyPI.)

Let's say your slow function looked like this:

def spam():     global d     for meat in get_all_meats():         count = get_meat_count(meat)         d.setdefault(meat, 0) += count 

Instead, you'd do this:

def spam_one(meat):     count = get_meat_count(meat)     return meat, count  with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:     results = executor.map(spam_one, get_canned_meats(), timeout=5)     for (meat, count) in results:         d.setdefault(meat, 0) += count 

As many results as you get within 5 seconds get added to the dict; if that isn't all of them, the rest are abandoned, and a TimeoutError is raised (which you can handle however you want—log it, do some quick fallback code, whatever).

And if the tasks really are independent (as they are in my stupid little example, but of course they may not be in your real code, at least not without a major redesign), you can parallelize the work for free just by removing that max_workers=1. Then, if you run it on an 8-core machine, it'll kick off 8 workers and given them each 1/8th of the work to do, and things will get done faster. (Usually not 8x as fast, but often 3-6x as fast, which is still pretty nice.)

like image 44
abarnert Avatar answered Sep 26 '22 12:09

abarnert