First, make some example files:
2010..2015 | % { "" | Set-Content "example $_.txt" }
#example 2010.txt
#example 2011.txt
#example 2012.txt
#example 2013.txt
#example 2014.txt
#example 2015.txt
What I want to do is match the year with a regex capture group, then reference the match with $matches[1]
and use it. I can write this to do both in one scriptblock, in one cmdlet, and it works fine:
gci *.txt | foreach {
if ($_ -match '(\d+)') # regex match the year
{ # on the current loop variable
$matches[1] # and use the capture group immediately
}
}
#2010
#2011
#.. etc
I can also write this to do the match in one scriptblock, and then reference $matches
in another cmdlet's scriptblock later on:
gci *.txt | where {
$_ -match '(\d+)' # regex match here, in the Where scriptblock
} | foreach { # pipeline!
$matches[1] # use $matches which was set in the previous
# scriptblock, in a different cmdlet
}
Which has the same output and it appears to work fine. But is it guaranteed to work, or is it undefined and a coincidence?
Could 'example 2012.txt'
get matched, then buffered. 'example 2013.txt'
gets matched, then buffered. | foreach
gets to work on 'example 2012.txt'
but $matches
has already been updated with 2013
and they're out of sync?
I can't make them fall out of sync - but I could still be relying on undefined behaviour.
(FWIW, I prefer the first approach for clarity and readability as well).
There is no synchronization going on, per se. The second example works because of the way the pipeline works. As each single object gets passed along by satisfying the condition in Where-Object
, the -Process
block in ForEach-Object
immediately processes it, so $Matches
hasn't yet been overwritten from any other -match
operation.
If you were to do something that causes the pipeline to gather objects before passing them on, like sorting, you would be in trouble:
gci *.txt | where {
$_ -match '(\d+)' # regex match here, in the Where scriptblock
} | sort | foreach { # pipeline!
$matches[1] # use $matches which was set in the previous
# scriptblock, in a different cmdlet
}
For example, the above should fail, outputting n objects, but they will all be the very last match.
So it's prudent not to rely on that, because it obscures the danger. Someone else (or you a few months later) may not think anything of inserting a sort
and then be very confused by the result.
As TheMadTechnician pointed out in the comments, the placement changes things. Put the sort after the part where you reference $Matches
(in the foreach
), or before you filter with where
, and it will still work as expected.
I think that drives home the point that it should be avoided, as it's fairly unclear. If the code changes in parts of the pipeline you don't control, then the behavior may end up being different, unexpectedly.
I like to throw in some verbose output to demonstrate this sometimes:
gci *.txt | where {
"Where-Object: $_" | Write-Verbose -Verbose
$_ -match '(\d+)' # regex match here, in the Where scriptblock
} | foreach { # pipeline!
"ForEach-Object: $_" | Write-Verbose -Verbose
$matches[1] # use $matches which was set in the previous
# scriptblock, in a different cmdlet
}
gci *.txt | where {
"Where-Object: $_" | Write-Verbose -Verbose
$_ -match '(\d+)' # regex match here, in the Where scriptblock
} | sort | foreach { # pipeline!
"ForEach-Object: $_" | Write-Verbose -Verbose
$matches[1] # use $matches which was set in the previous
# scriptblock, in a different cmdlet
}
The difference you'll see is that in the original, as soon as where
"clears" an object, foreach
gets it right away. In the sorted, you can see all of the where
s happening first, before foreach
gets any of them.
sort
doesn't have any verbose output so I didn't bother calling it that way, but essentially its Process {}
block just collects all of objects so it can compare (sort!) them, then spits them out in the End {}
block.
First, here's a function that mocks Sort-Object
's collection of objects (it doesn't actually sort them or do anything):
function mocksort {
[CmdletBinding()]
param(
[Parameter(
ValueFromPipeline
)]
[Object]
$O
)
Begin {
Write-Verbose "Begin (mocksort)"
$objects = @()
}
Process {
Write-Verbose "Process (mocksort): $O (nothing passed, collecting...)"
$objects += $O
}
End {
Write-Verbose "End (mocksort): returning objects"
$objects
}
}
Then, we can use that with the previous example and some sleep at the end:
gci *.txt | where {
"Where-Object: $_" | Write-Verbose -Verbose
$_ -match '(\d+)' # regex match here, in the Where scriptblock
} | mocksort -Verbose | foreach { # pipeline!
"ForEach-Object: $_" | Write-Verbose -Verbose
$matches[1] # use $matches which was set in the previous
# scriptblock, in a different cmdlet
} | % { sleep -milli 500 ; $_ }
To complement briantist's great answer:
Aside from aggregating cmdlets such as Sort-Object
(cmdlets that (must) collect all input first, before producing any output), the -OutBuffer
common parameter can also break the command:
gci *.txt | where -OutBuffer 100 {
$_ -match '(\d+)' # regex match here, in the Where scriptblock
} | foreach { # pipeline!
$matches[1] # use $matches which was set in the previous
# scriptblock, in a different cmdlet
}
This causes the where
(Where-Object
) cmdlet to buffer its first 100 output objects until the 101th object is generated, and only then send these 101 objects on, so that $matches[1]
in the foreach
(ForEach-Object
) block will in this case only see the 101th (matching) filename's capture-group value, in every of the (first) 101 iterations.
Generally, with an -OutputBuffer
value of N, the first N + 1 foreach
invocations would all see the same $matches
value from the (N + 1)-th input object, and so forth for subsequent batches of N + 1 objects.
From Get-Help about_CommonParameters
:
When you use this parameter, Windows PowerShell does not call the next cmdlet in the pipeline until the number of objects generated equals OutBuffer + 1. Thereafter, it sends all objects as they are generated.
Note that the last sentence suggests that only the first N + 1 objects are subject to buffering, which, however, is not true, as the following example (thanks, @briantist) demonstrates:
1..5 | % { Write-Verbose -vb $_; $_ } -OutBuffer 1 | % { "[$_]" }
VERBOSE: 1
VERBOSE: 2
[1]
[2]
VERBOSE: 3
VERBOSE: 4
[3]
[4]
VERBOSE: 5
[5]
That is, -OutBuffer 1
caused all objects output by %
(ForEach-Object
) to be batched in groups of 2, not just the first 2.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With