Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does begin/process/end save the need for foreach? Still needed for the parameter isn't it?

Tags:

powershell

I've understood that with begin/process/end the process section runs mutiple times, for each object in the pipeline. So if I have a function like this:

function Test-BeginProcessEnd {
    [cmdletbinding()]
    Param(
        [Parameter(Mandatory=$true, ValueFromPipeline=$True)]
        [string]$myName     
    )
    begin {}
    process {
        Write-Host $myName
    }
    end {}
}

I can pipe an array to it, like this, and it processes each object:

PS C:\> @('aaa','bbb') | Test-BeginProcessEnd
aaa
bbb
PS C:\>

But if I try to use the parameter in the command line, I can only pass it 1 string, so I can do:

PS C:\> Test-BeginProcessEnd -myName 'aaa'
aaa
PS C:\>

But I can't do:

PS C:\> Test-BeginProcessEnd -myName @('aaa','bbb')
Test-BeginProcessEnd : Cannot process argument transformation on parameter 'myName'. Cannot convert value to type
System.String.
At line:1 char:30
+ Test-BeginProcessEnd -myName @('aaa','bbb')
+                              ~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidData: (:) [Test-BeginProcessEnd], ParameterBindingArgumentTransformationException
    + FullyQualifiedErrorId : ParameterArgumentTransformationError,Test-BeginProcessEnd
PS C:\>

Obviously I want the parameter usage to be the same as via the pipeline, so I have to change the function to be:

function Test-BeginProcessEnd
{
    [cmdletbinding()]
    Param(
        [Parameter(Mandatory=$true, ValueFromPipeline=$True)]
        [string[]]$myNames      
    )
    begin {}
    process {
        foreach ($name in $myNames) {
            Write-Host $name
        }
    }
    end {}
}

So I've had to use foreach anyway, and the looping functionality of the Process section hasn't helped me.

Have I missed something? I can't see what it's good for! Thanks for any help.

like image 305
aberdeen angus Avatar asked Mar 08 '23 02:03

aberdeen angus


2 Answers

tl;dr:

Because of how binding pipeline input to parameters works in PowerShell (see below), defining a parameter that accepts pipeline input as well as direct parameter-value passing of arrays:

  • indeed requires looping inside the process block
  • invariably wraps individual input objects received through the pipeline in a single-element array each, which is inefficient.

Defining your pipeline-binding parameters as a scalar avoids this awkwardness, but passing multiple inputs is then limited to the pipeline - you won't be able to pass arrays as a parameter argument.[1]

This asymmetry is perhaps surprising.


When you define a parameter that accepts pipeline input, you get implicit array logic for free:

  • With pipeline input, PowerShell calls your process block once for each input object, with the current input object bound to the parameter variable.

  • By contrast, passing input as a parameter value only ever enters the process once, with the input as a whole bound to your parameter variable.

The above applies whether or not your parameter is array-valued: each pipeline input object individually is bound / coerced to the parameter's type exactly as declared.


To put this in concrete terms with your example function that declares parameter [Parameter(Mandatory=$true, ValueFromPipeline=$True)] [string[]] $myNames:

Let's assume an input array (collection) of 'foo', 'bar' (note that the @() around array literals is normally not necessary).

  • Parameter-value input, Test-BeginProcessEnd -myNames 'foo', 'bar':

    • The process block is called once,
    • with input array 'foo', 'bar' bound to $myNames as a whole.
  • Pipeline input, 'foo', 'bar' | Test-BeginProcessEnd:

    • The process block is called twice,
    • with 'foo' and 'bar' each coerced to [string[]] - i.e., a single-element array.

To see it in action:

function Test-BeginProcessEnd
{
    [cmdletbinding()]
    Param(
      [Parameter(Mandatory, ValueFromPipeline)]
      [string[]]$myNames      
    )
    begin {}
    process {
      Write-Verbose -Verbose "in process block: `$myNames element count: $($myNames.Count)"
      foreach ($name in $myNames) { $name }
    }
    end {}
}
# Input via parameter
>  Test-BeginProcessEnd 'foo', 'bar'
VERBOSE: in process block: $myNames element count: 2
foo
bar

# Input via pipeline
> 'foo', 'bar' | Test-BeginProcessEnd
VERBOSE: in process block: $myNames element count: 1
foo
VERBOSE: in process block: $myNames element count: 1
bar

Optional reading: Various tips re functions and pipeline input

  • begin, process, end blocks may be used in a function whether or not it is an advanced function (cmdlet-like - see below).

    • If you only need the 1st or a certain number of objects from the pipeline, there is currently no way to exit the pipeline prematurely; instead, you must set a Boolean flag that tells you when to ignore subsequent process block invocations.
    • You can, however use an intervening, separate call such as | Select-Object -First 1, which efficiently exits the pipeline after the desired number of objects have been received.
    • The current inability to do the same from user code is the subject of this suggestion on GitHub.
    • Alternatively, you can forgo a process block and use $Input | Select-Object 1 inside your function, but, as stated, that will collect all input in memory first; another - also imperfect - alternative can be found in this answer of mine.
  • If you do not use these blocks, you can still optionally access pipeline input via the automatic $Input variable; note, however, that your function then runs after ALL pipeline input has been collected in memory (not object by object as with a process block).

  • Generally, though, it pays to use a process block:

    • Objects can be processed one by one, as they're being produced by the source command, which has 2 benefits:
      • It makes processing more memory-efficient, because the source command's output doesn't have to be collected in full first.
      • Your function starts to produce output right away, without needing to wait for the source command to finish first.
    • Hopefully soon (see above), you'll be able to exit the pipeline once all objects of interest have been processed.
    • Cleaner syntax and structure: the process block is an implicit loop over all pipeline input, and you can selectively perform initialization and cleanup tasks in the begin and end blocks, respectively.
  • It is easy to turn a function into an advanced function, however, which offers benefits with respect to supporting common parameters such as -ErrorAction, and -OutVariable as well as detection of unrecognized parameters:

    • Use a param() block to declare the parameters and decorate that block with the [CmdletBinding()] attribute, as shown above (also, decorating an individual parameter with a [Parameter()] attribute implicitly makes a function an advanced one, but for clarity it's better to use [CmdletBinding()] explicitly).

[1] Strictly speaking, you can, but only if you type your parameter [object] (or don't specify a type at all, which is the same).
However, the input array/collection is then bound as a whole to the parameter variable, and the process block is still only entered once, where you'd need to perform your own enumeration.
Some standard cmdlets, such as Export-Csv, are defined this way, yet they do not enumerate a collection passed via the -InputObject parameter, making direct use of that parameter effectively useless - see this GitHub issue.

like image 144
mklement0 Avatar answered Apr 26 '23 14:04

mklement0


The BEGIN-PROCESS-END structure is used for scripts/advanced functions where (a) you want to be able to pipe data to it, and (b) there is stuff that you want to do before (BEGIN) and/or after (END) processing the entire set of data (as opposed to before or after each individual item that comes through the pipe). If you pass a single value to an advanced function that uses the foreach to be able to handle an array, it treats the single value as an array of one item; the pipe does this, in effect - except that with pipe, it doesn't need to reload the cmdlet for each item. This is, ultimately, why you can write scripts/advanced functions that can be used either in the pipeline or as 'standalone' processes. It is not that PROCESS causes the looping; it's that it enables the efficient processing of values coming in from the pipeline. If you want to handle multiple values passed to it by other than the pipeline, you need to manage the looping yourself - as you've discovered.

like image 34
Jeff Zeitlin Avatar answered Apr 26 '23 15:04

Jeff Zeitlin