Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What determines whether the Powershell pipeline will unroll a collection?

# array
C:\> (1,2,3).count
3
C:\> (1,2,3 | measure).count
3

# hashtable
C:\> @{1=1; 2=2; 3=3}.count
3
C:\> (@{1=1; 2=2; 3=3} | measure).count
1

# array returned from function
C:\> function UnrollMe { $args }
C:\> (UnrollMe a,b,c).count
3
C:\> (UnrollMe a,b,c | measure).count
1
C:\> (1,2,3).gettype() -eq (UnrollMe a,b,c).gettype()
True

The discrepancy with HashTables is fairly well known, although the official documentation only mentions it obliquely (via example).

The issue with functions, though, is news to me. I'm kind of shocked it hasn't bitten me before now. Is there some guiding principle we scripters can follow? I know that when writing cmdlets in C# there's an overload of WriteObject where you can control enumeration explicitly, but AFAIK there's no such construct in the Posh language itself. As the final example shows, the Posh interpreter seems to believe there is no difference in the type of objects being piped. I suspect there may be some Object vs PSObject weirdness under the hood, but that's of little use when you're writing pure Posh and expect the script language to "just work."

/ EDIT /

Keith is correct to point out that in my example, I'm passing in a single string[] argument rather than 3 string arguments. In other words, the reason Measure-Object says Count=1 is because it's seeing a single array-of-arrays whose first element is @("a", "b", "c"). Fair enough. This knowledge allows you to work around the issue in several ways:

# stick to single objects
C:\> (UnrollMe a b c | measure).count
3

# rewrite the function to handle nesting
C:\> function UnrollMe2 { $args[0] }
C:\> (UnrollMe2 a,b,c | measure).count
3

# ditto
C:\> function UnrollMe3 { $args | %{ $_ } }
C:\> (UnrollMe3 a,b,c | measure).count
3

However, it doesn't explain everything...

# as seen earlier - if we're truly returning @( @("a","b","c") ) why not count=1?
C:\> (UnrollMe a,b,c).count
3

# our theory must also explain these results:
C:\> ((UnrollMe a,b,c) | measure).count
3
C:\> ( @(@("a","b","c")) | measure).count
3
C:\> ((UnrollMe a,b,c d) | measure).count
2

From what I can extrapolate there's another rule in play: if you have an array with exactly one element AND the parser is in expression mode, then the interpreter will "unwrap" said element. Any more subtleties I'm missing?

like image 353
Richard Berg Avatar asked Dec 01 '09 18:12

Richard Berg


1 Answers

$args is unrolled. Remember that function parameters are normally passed using space to separate them. When you pass in 1,2,3 you are passing in a single argument that is an array of three numbers that gets assigned to $args[0]:

PS> function UnrollMe { $args }
PS> UnrollMe 1 2 3 | measure

Count    : 3

Putting the results (an array) within a grouping expression (or subexpression e.g. $()) makes it eligible again for unrolling so the following unrolls the object[] containing 1,2,3 returned by UnrollMe:

PS> ((UnrollMe 1,2,3) | measure).Count
3

which is equivalent to:

PS> ((1,2,3) | measure).Count
3

BTW it doesn't just apply to an array with one element.

PS> ((1,2),3) | %{$_.GetType().Name}
Object[]
Int32

Using an array subexpression (@()) on something that is already an array has no effect no matter how many times you apply it. :-) If you want to prevent unrolling use the comma operator because it will always create another outer array which gets unrolled. Note that in this scenario you don't really prevent unrolling, you just work around the unrolling by introducing an outer "wrapper" array that gets unrolled instead of your original array e.g.:

PS> (,(1,2,3) | measure).Count
1

Finally, when you execute this:

PS> (UnrollMe a,b,c d) | %{$_.GetType().Name}
Object[]
String

You can see that UnrollMe returns two items (a,b,c) as an array and d as a scalar. Those two items get sent down the pipeline separately which is the resulting count is 2.

like image 130
Keith Hill Avatar answered Sep 29 '22 19:09

Keith Hill