I want to run multiple shell processes, but when I try to run more than 63, they hang. When I reduce max_threads
in the thread pool to n
, it hangs after running the n
th shell command.
As you can see in the code below, the problem is not in start
blocks per se, but in start
blocks that contain the shell
command:
#!/bin/env perl6
my $*SCHEDULER = ThreadPoolScheduler.new( max_threads => 2 );
my @processes;
# The Promises generated by this loop work as expected when awaited
for @*ARGS -> $item {
@processes.append(
start { say "Planning on processing $item" }
);
}
# The nth Promise generated by the following loop hangs when awaited (where n = max_thread)
for @*ARGS -> $item {
@processes.append(
start { shell "echo 'processing $item'" }
);
}
await(@processes);
Running ./process_items foo bar baz
gives the following output, hanging after processing bar
, which is just after the n
th (here 2
nd) thread has run using shell
:
Planning on processing foo Planning on processing bar Planning on processing baz processing foo processing bar
What am I doing wrong? Or is this a bug?
Perl 6 distributions tested on CentOS 7:
Rakudo Star 2018.06
Rakudo Star 2018.10
Rakudo Star 2019.03-RC2
Rakudo Star 2019.03
With Rakudo Star 2019.03-RC2, use v6.c
versus use v6.d
did not make any difference.
The shell
and run
subs use Proc
, which is implemented in terms of Proc::Async
. This uses the thread pool internally. By filling up the pool with blocking calls to shell
, the thread pool becomes exhausted, and so cannot process events, resulting in the hang.
It would be far better to use Proc::Async
directly for this task. The approach with using shell
and a load of real threads won't scale well; every OS thread has memory overhead, GC overhead, and so forth. Since spawning a bunch of child processes is not CPU-bound, this is rather wasteful; in reality, just one or two real threads are needed. So, in this case, perhaps the implementation pushing back on you when doing something inefficient isn't the worst thing.
I notice that one of the reasons for using shell
and the thread pool is to try and limit the number of concurrent processes. But this isn't a very reliable way to do it; just because the current thread pool implementation sets a default maximum of 64 threads does not mean it always will do so.
Here's an example of a parallel test runner that runs up to 4 processes at once, collects their output, and envelopes it. It's a little more than you perhaps need, but it nicely illustrates the shape of the overall solution:
my $degree = 4;
my @tests = dir('t').grep(/\.t$/);
react {
sub run-one {
my $test = @tests.shift // return;
my $proc = Proc::Async.new('perl6', '-Ilib', $test);
my @output = "FILE: $test";
whenever $proc.stdout.lines {
push @output, "OUT: $_";
}
whenever $proc.stderr.lines {
push @output, "ERR: $_";
}
my $finished = $proc.start;
whenever $finished {
push @output, "EXIT: {.exitcode}";
say @output.join("\n");
run-one();
}
}
run-one for 1..$degree;
}
The key thing here is the call to run-one
when a process ends, which means that you always replace an exited process with a new one, maintaining - so long as there are things to do - up to 4 processes running at a time. The react
block naturally ends when all processes have completed, due to the fact that the number of events subscribed to drops to zero.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With