Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python inside GNU Screen eventually becomes idle if Screen is dettached

I have a python script which uses multiprocessing and subprocess to launch multiple external commands in parallel with different arguments. The code can be found here.

For convenience I launch this script inside a GNU Screen session. The machine where this script is running has 12 processors which are idle until processes become active.

Each of the processes takes between a few hours to a couple of days to run hence I often disconnect from the machine and detach the screen session.

However, recently I've noticed a behavior which I never experienced before. On several occasions I've returned to the machine to find it idle with a load of zero. If I get a list of active processes either via ps ux or top I can still find the script (and the subprocesses) on the list of processes. I then reattach the screen session to check the state of the program and immediately a new batch of processes is sent to the queue and the load of the system goes back to 12 in a matter of seconds. Note that I did absolutely nothing to the script other than reattaching the screen session.

I've installed a monitoring tool on the system and what happens is that some processes finish after a certain time and no new processes are launched. So the system is active until subprocesses are busy and becomes idle as soon as no more jobs are released from the queue.

So my question is, does anyone know of any reason that explains this behavior?

EDIT: After a year or so, this problem is no longer reproducible, either some patch on screen or python itself. I'm accepting the answer as it provided good directions for testing.

like image 741
unode Avatar asked May 08 '11 01:05

unode


1 Answers

I can't explain the reason for what you are seeing. However, I do have an idea of what you can try next.

  1. Try piping the output of the script to: | tee out.txt If that has no effect, try...
  2. Run screen on another [hop] host. From there SSH into your worker host. Run your script in the non-emulated shell. Then feel free to disconnect and reconnect from your hop to check on the process. This should hide from the worker that screen is in anyway involved.

Please comment back with the results of these tests. That will give me more to go on.

like image 115
Bruno Bronosky Avatar answered Sep 23 '22 08:09

Bruno Bronosky