Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can the following Nested foreach loop be simplified in PowerShell?

I have created a script that loops through an array and excludes any variables that are found within a second array.

While the code works; it got me wondering if it could be simplified or piped.

   $result = @()
   $ItemArray = @("a","b","c","d")
   $exclusionArray = @("b","c")

    foreach ($Item in $ItemArray)
    {
        $matchFailover = $false
        :gohere
        foreach ($ExclusionItem in $exclusionArray)
        {
            if ($Item -eq $ExclusionItem)
            {
                Write-Host "Match: $Item = $ExclusionItem"
                $matchFailover = $true
                break :gohere
            }
            else{
            Write-Host "No Match: $Item != $ExclusionItem"
            }
        }
        if (!($matchFailover))
        {
            Write-Host "Adding $Item to results"
            $result += $Item
        }
    }
    Write-Host "`nResults are"
    $result
like image 466
AfroBadger Avatar asked Mar 04 '23 07:03

AfroBadger


1 Answers

To give your task a name: You're looking for the relative complement aka set difference between two arrays:

In set-theory notation, it would be $ItemArray \ $ExclusionArray, i.e., those elements in $ItemArray that aren't also in $ExclusionArray.

This related question is looking for the symmetric difference between two sets, i.e., the set of elements that are unique to either side - at last that's what the Compare-Object-based solutions there implement, but only under the assumption that each array has no duplicates.


EyIM's helpful answer is conceptually simple and concise.

A potential problem is performance: a lookup in the exclusion array must be performed for each element in the input array.

With small arrays, this likely won't matter in practice.

With larger arrays, LINQ offers a substantially faster solution:

Note: In order to benefit from the LINQ solution, your arrays should be in memory already, and the benefit is greater the larger the exclusion array is. If your input is streaming via the pipeline, the overhead from executing the pipeline may make attempts to optimize array processing pointless or even counterproductive, in which case sticking with the native PowerShell solution makes sense - see iRon's answer.

# Declare the arrays as [string[]]
# so that calling the LINQ method below works as-is.
# (You could also cast to [string[]] ad hoc.)
[string[]] $ItemArray = 'a','b','c','d'
[string[]] $exclusionArray = 'b','c'

# Return only those elements in $ItemArray that aren't also in $exclusionArray
# and convert the result (a lazy enumerable of type [IEnumerable[string]])
# back to an array to force its evaluation
# (If you directly enumerate the result in a pipeline, that step isn't needed.)
[string[]] [Linq.Enumerable]::Except($ItemArray, $exclusionArray) # -> 'a', 'd'

Note the need to use the LINQ types explicitly, via their static methods, because PowerShell, as of v7, has no support for extension methods. However, there is a proposal on GitHub to add such support; this related proposal asks for improved support for calling generic methods.

See this answer for an overview of how to currently call LINQ methods from PowerShell.


Performance comparison:

Tip of the hat to iRon for his input.

The following benchmark code uses the Time-Command function to compare the two approaches, using arrays with roughly 4000 and 2000 elements, respectively, which - as in the question - differ by only 2 elements.

Note that in order to level the playing field, the .Where() array method (PSv4+) is used instead of the pipeline-based Where-Object cmdlet, as .Where() is faster with arrays already in memory.

Here are the results averaged over 10 runs; note the relative performance, as shown in the Factor columns; from a single-core Windows 10 VM running Windows PowerShell v5.1.:

Factor Secs (10-run avg.) Command                              TimeSpan
------ ------------------ -------                              --------
1.00   0.046              # LINQ...                            00:00:00.0455381
8.40   0.382              # Where ... -notContains...          00:00:00.3824038

The LINQ solution is substantially faster - by a factor of 8+ (though even the much slower solution only took about 0.4 seconds to run).

It seems that the performance gap is even wider in PowerShell Core, where I've seen a factor of around 19 with v7.0.0-preview.4.; interestingly, both tests ran faster individually than in Windows PowerShell.

Benchmark code:

# Script block to initialize the arrays.
# The filler arrays are randomized to eliminate caching effects in LINQ.
$init = {
  $fillerArray = 1..1000 | Get-Random -Count 1000
  [string[]] $ItemArray = $fillerArray + 'a' + $fillerArray + 'b' + $fillerArray + 'c' + $fillerArray + 'd'
  [string[]] $exclusionArray = $fillerArray + 'b' + $fillerArray + 'c'
}

# Compare the average of 10 runs.
Time-Command -Count 10 { # LINQ
  . $init
  $result = [string[]] [Linq.Enumerable]::Except($ItemArray, $exclusionArray)
}, { # Where ... -notContains
  . $init
  $result = $ItemArray.Where({ $exclusionArray -notcontains $_ })
}
like image 178
mklement0 Avatar answered May 01 '23 08:05

mklement0