Foreach-Object -Parallel is great. But sometimes the script block will hang, which will block the rest of the code. There is a timeOutSeconds parameter on the Foreach-Object, but it's a global one. When the timeout is reached, any remaining object in the queue is dropped. In my opinion, it's useless, the timeout should apply on each individual thread.
So I tried to write something short and efficient to get that behavior:
# Generate data
$list = 1..100
# Max Processing time for 1 item = 1s
$job_timeout = 1
# Processing data
$list |
ForEach-Object -parallel {
# using start-job to be able to cancel it if it takes too long
$j = Start-Job -InputObject $_ -ScriptBlock {
# Simulate processing time
Start-Sleep -Milliseconds (Get-Random -Min 0 -max 3000)
# returning processed data
$input
}
# building result object
[pscustomobject]@{
item = $_
# obtaining job result ($null if it's still running)
result = $j | Wait-Job -Timeout $using:job_timeout | Receive-job
}
# clean-up
$j | stop-job -PassThru | remove-job
}
My problem is that embedding a start-job inside a foreach-object -parallel seems dumb on a second thought. I could probably get the same result using only jobs without writing more lines.
So the question is, is there a good solution to have a per thread timeout in a foreach-object -parallel block.
Thanks for your help
Perhaps ForEach-Object -Parallel
is not the right cmdlet to approach this problem, it might be simpler to use Start-ThreadJob
in this case.
You could, in theory, implement a similar logic if using -AsJob
from ForEach-Object -Parallel
then targeting the .ChildJobs
property but I would personally not try to force that, it would only make things more complicated than they should be.
$timeout = [timespan]::FromSeconds(1)
[System.Collections.Generic.List[object]] $jobs = 1..100 |
ForEach-Object {
Start-ThreadJob {
# Simulate processing time
Start-Sleep -Milliseconds (Get-Random -Min 0 -Maximum 3000)
# returning processed data
$using:_
} -ThrottleLimit 7
}
do {
for ($i = 0; $i -lt $jobs.Count; $i++) {
$job = $jobs[$i]
# if the job is completed
if ($job.State -eq 'Completed') {
# get the output
Receive-Job $job
# and remove it from the job list
$null = $jobs.Remove($job)
continue
}
# if the job is not running,
# it could be either stopped or failed
if ($job.State -ne 'Running') {
# ignore this one
continue
}
# if the time difference since its start is greater than the timeout
if ([datetime]::Now - $job.PSBeginTime -gt $timeout) {
# stop it, allowing a new job to start (this is based on `-ThrottleLimit`)
$job.StopJob($true, 'You are out of time')
}
}
}
while ($jobs.State -contains 'Running')
$jobs # <- Contains all jobs that ran out of time
Indeed, the current semantics of the -TimeoutSeconds
parameter of ForEach-Object
's PowerShell (Core) 7+ -Parallel
feature are unfortunate (as of PowerShell 7.3.6). To spell out your observation in more detail:
The -TimeoutSeconds
interval is applied to the duration of the overall, typically throttled and therefore "batched" invocation[1] rather than to the runtime of each thread.
Therefore, a timeout can occur even if all individual threads completed in less than the specified timeout; a simple example:
# This times out when the 3rd thread runs, because - due to ThrottleLimit 2 -
# it only starts after 1.5+ seconds, after the first 2 threads
# have finished.
1..4 |
ForEach-Object -TimeoutSeconds 2 -ThrottleLimit 2 -Parallel {
Start-Sleep -MilliSeconds 1500
$_ # Pass the input object through.
}
When a timeout occurs, the command terminates overall.
1
and 2
print; input 4
never got processed, because processing of 3
caused the timeout.GitHub issue #20197 asks for these shortcomings to be addressed.
As a - somewhat cumbersome - workaround, you can use the -AsJob
parameter to make ForEach-Object
return a job whose child jobs represent the individual threads, which can be monitored separately.
Applied to a slightly modified version of the example above that provokes a timeout for the 3rd input object:
# Use -AsJob to receive a job that allows monitoring the threads individually.
# Note that -AsJob cannot be combined with -TimeoutSeconds
$job =
1..4 |
ForEach-Object -AsJob -ThrottleLimit 2 -Parallel {
if ($_ -eq 3) {
# Provoke a timeout error for this specific input.
Start-Sleep -MilliSeconds 2500; $_
} else {
Start-Sleep -MilliSeconds 1500; $_
}
}
# Receive job output in a polling loop, and terminate child jobs
# that have run too long.
$timeout = 2
do {
Start-Sleep -Milliseconds 500 # Sleep a little.
# Get pending results.
$job | Receive-Job
# If any child jobs have been running for more than N seconds,
# stop (terminate) them.
# This will open up slots for more threads to spin up.
foreach ($childJob in $job.ChildJobs.Where({ $_.State -eq 'Running' })) {
if (([datetime]::now - $childJob.PSBeginTime).TotalSeconds -ge $timeout) {
Write-Verbose -Verbose "Stopping job with ID $($childJob.Id) due to running longer than $timeout seconds..."
$childJob | Stop-Job
}
}
} while ($job.ChildJobs.Where({ $_.State -in 'NotStarted', 'Running' }))
Output:
1
2
4
VERBOSE: Stopping job with ID 4 due to running longer than 2 seconds...
Note:
Input 4
was still processed, despite the thread for input 3
having timed out.
The ID
value of the child job isn't really meaningful except to distinguish it from other child jobs; if you want to know what input object caused the timeout, you'll have to echo it as part of the script block (at the start, before a timeout can occur) - the job object doesn't contain this information.
[1] More accurately, only a fixed number of threads are allowed to run at a time, based on the -ThrottleLimit
arguments, which defaults to 5
. If more threads are needed, they have to wait until "slots" open up, which happens when currently executing threads finish.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With