Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Prevent adding pscustomobject to array if already exists

I feel silly that I cannot figure this out, but say I have an array containing pscustomobjects. At an incredibly high level take the following example:

$arr = @()
$obj1 = [pscustomobject]@{prop1="bob";prop2="dude";prop3="awesome"}
$obj2 = [pscustomobject]@{prop1="bob";prop2="dude";prop3="awesome"}

$arr += $obj1

In this case $obj1 and $obj2 have the exact same items/properties. How do I test if $testarr contains the properties of $obj1 to avoid adding $obj2 to it?

Note, the above is a crude example. The pscustomobjects are being dynamically created from a dataset and added to the array, but I want to avoid duplicates from being added.

I understand the following returns true, but I fully expect duplicates for any given single property. As such I need to compare the ENTIRE pscustomobject and all properties together for uniqueness.

$arr.Name -Contains 'Bob' #returns true

Side question... Why are $obj1 and $obj2 not themselves considered equal? I assume it's because they are technically different objects, just with the same values, but I don't understand why that works, but two different variables with just a string tests as the same.

$obj1 -eq $obj2  #returns false
$str1 = "test"
$str2 = "test"
$str1 -eq $str2  #returns true
like image 582
Matthew McDonald Avatar asked Jan 25 '23 07:01

Matthew McDonald


2 Answers

The problem is the behavior of the open-ended [pscustomobject] type with respect to equality comparison and as hashtable keys:

[pscustomobject] is a .NET reference type (that doesn't define custom equality comparisons), so comparing two instances with -eq tests for reference equality, which means that only values that reference the very same instance are considered equal.[1]

Using [pscustomobject] instance as the keys of a hashtable is similarly unhelpful, because, as iRon points out, calling .GetHashCode() on a [pscustomobject] instance always yields the same value, irrespective of the instance's set of properties and values.[2] Arguably, this is a bug, as discussed in GitHub issue #15806.


Solutions:

  • If you're willing to use (PSv5+) custom classes in lieu of [pscustomobject] instances, Santiago Squarzon's helpful answer offers a solution that relies on a custom class implementing the System.IEquatable<T> interface in order to support a custom, class-specific equality test - but note that since such as custom class compares specific, hard-coded properties, it isn't a general replacement for the open-ended [pscustomobject] type, whose instances can have arbitrary property sets.

  • iRon's helpful answer provides a generic solution via a custom class that wraps a hashtable and uses the XML-serialized form of its [pscustomobject] entries as the entry keys (using the serialization format PowerShell uses for its remoting and background-job infrastructure), relying on the fact that distinct strings with the same content report the same hash code, via .GetHahCode(). This is probably the best overall solution, because it performs reasonably well while providing a generic comparison that is reasonably robust: it works robustly for value-type property values (as are typical in [pscustomobject] instances) and tests the properties of reference-type values for value equality, but the necessary limit on serialization depth means that it is at least possible for deeply nested objects with differing property values below the serialization depth to be considered the same - see this answer for more information on PowerShell's serialization and its limitations.

  • Below is an ad-hoc solution based on iRon's answer that doesn't require defining custom classes, but it doesn't perform well.

# Available in PSv5+, to allow referencing the [System.Management.Automation.PSSerializer] type
# as just [PSSerializer]; in v4-, use the full type name.
using namespace System.Management.Automation

# Define a *list* rather than an array, because it is
# efficiently extensible
$list = [System.Collections.ArrayList] (
  [pscustomobject] @{prop1="bob";  prop2="dude";   prop3="awesome"}, 
  [pscustomobject] @{prop1="alice";prop2="dudette";prop3="awesome"}
)

# Conditionally add two objects to the list:
# One of them is a duplicate and will be ignored.
[pscustomobject]@{prop1="bob";prop2="dude";prop3="awesome"},
[pscustomobject]@{prop1="ted";prop2="dude";prop3="middling"} | ForEach-Object {
  if ($list.ForEach({ [PSSerializer]::Serialize($_) }) -cnotcontains [PSSerializer]::Serialize($_)) {
     $null = $list.Add($_)
  }
}

Note the use of the .ForEach() array method so as to (relatively) efficiently serialize each element of list $list, though note that it invariably involves creating a temporary array of the same size, containing the element-specific serializations.

There are ways of optimizing the performance of this code, but if that is needed you may as well use iRon's solution.


[1] For instance, [pscustomobject] @{ foo=1 } -eq [pscustomobject] @{ foo=1 } yields $false, because two distinct instances are being compared; that they happen to have the same set of properties and values is irrelevant.

[2] For instance, the following prints the same value twice, despite providing two obviously different objects as input: [pscustomobject] @{ foo=1 }, [pscustomobject] @{ bar=2 } | % GetHashCode

[3] For instance, ([pscustomobject]@{prop1="bob";prop2="dude";prop3="awesome"}).psbase.ToString() returns verbatim @{prop1=bob; prop2=dude; prop3=awesome}

like image 59
mklement0 Avatar answered Jan 26 '23 21:01

mklement0


Taking this purely from MS Docs, I'm nowhere near an expert on classes.

To create comparable classes, you need to implement System.IEquatable<T> in your class.

class CustomObjectEquatable : System.IEquatable[Object] {
    [string] $Prop1
    [string] $Prop2
    [string] $Prop3

    [bool] Equals([Object]$obj) {
        return $this.Prop1 -eq $obj.Prop1 -and
               $this.Prop2 -eq $obj.Prop2 -and
               $this.Prop3 -eq $obj.Prop3
    }

    [int] GetHashCode() {
        return [Tuple]::Create(
            [string] $this.Prop1,
            [string] $this.Prop2,
            [string] $this.Prop3
        ).GetHashCode()
    }
}
  • Testing for equality:
$x = [CustomObjectEquatable]@{
    Prop1 = "bob"
    Prop2 = "dude"
    Prop3 = "awesome"
}
$y = [CustomObjectEquatable]@{
    Prop1 = "bob"
    Prop2 = "dude"
    Prop3 = "awesome"
}

$x -eq $y # => True
$x.GetHashCode() -eq $y.GetHashCode() # => True

$x = [CustomObjectEquatable]@{
    prop1 = "john"
    prop2 = "dude"
    prop3 = "awesome"
}
$y = [CustomObjectEquatable]@{
    prop1 = "johhn"
    prop2 = "dude"
}

$x -eq $y  # => False
$x.GetHashCode() -eq $y.GetHashCode() # => False

- Edit 9/9/2022

I've decided to add this comparer class to this answer, the implementation is mostly inspired by this answer from mklement0. It can, theoretically, test for equality between PSCustomObject instances with any amount of properties and allows for case sensitive and insensitive comparison. Any suggestions for improvements are welcomed.

using namespace System.Collections.Generic
using namespace System.Collections

class PSCustomObjectComparer : IEqualityComparer[object] {
    [StringComparer] $Comparer = [StringComparer]::InvariantCultureIgnoreCase

    PSCustomObjectComparer() { }
    PSCustomObjectComparer([StringComparer] $Comparer) {
        $this.Comparer = $Comparer
    }

    [bool] Equals([object] $xObject, [object] $yObject) {
        $x = @($xObject.PSObject.Properties)
        $y = @($yObject.PSObject.Properties)

        if(-not $x.Count.Equals($y.Count)) {
            return $false
        }

        return ([IStructuralEquatable] $x.Name).Equals($y.Name, $this.Comparer) -and
               ([IStructuralEquatable] $x.Value).Equals($y.Value, $this.Comparer)
    }

    [int] GetHashCode([object] $xObject) {
        $x = $xObject.PSObject.Properties
        try {
            return ([IStructuralEquatable] $x.Name).GetHashCode($this.Comparer) -bxor
                   ([IStructuralEquatable] $x.Value).GetHashCode($this.Comparer)
        }
        catch {
            $values = foreach($value in $x.Value) {
                if(-not $value) { continue }
                $value
            }

            if(-not $values) {
                return ([IStructuralEquatable] $x.Name).GetHashCode($this.Comparer)
            }

            return ([IStructuralEquatable] $x.Name).GetHashCode($this.Comparer) -bxor
                   ([IStructuralEquatable] $values).GetHashCode($this.Comparer)
        }
    }
}
  • Testing for equality:
$hash = [HashSet[object]]::new([PSCustomObjectComparer]::new())
$hash.Add([pscustomobject]@{ foo = 'hello'; bar = 'World'; baz = 123 }) # true
$hash.Add([pscustomobject]@{ foo = 'hello'; bar = 'World'; baz = 123 }) # false
$hash.Add([pscustomobject]@{ foo = 'HELLO'; bar = 'World'; baz = 123 }) # false

$hash = [HashSet[object]]::new([PSCustomObjectComparer]::new([StringComparer]::InvariantCulture))
$hash.Add([pscustomobject]@{ foo = 'hello'; bar = 'World'; baz = 123 }) # true
$hash.Add([pscustomobject]@{ foo = 'hello'; bar = 'World'; baz = 123 }) # false
$hash.Add([pscustomobject]@{ foo = 'HELLO'; bar = 'World'; baz = 123 }) # true

$insensitiveComparison = [PSCustomObjectComparer]::new()
$insensitiveComparison.Equals(
    [pscustomobject]@{ foo = 'hello'; bar = 'World'; baz = 123 },
    [pscustomobject]@{ foo = 'HELLO'; bar = 'World'; baz = 123 }
) # true

$sensitiveComparison = [PSCustomObjectComparer]::new([StringComparer]::InvariantCulture)
$sensitiveComparison.Equals(
    [pscustomobject]@{ foo = 'hello'; bar = 'World'; baz = 123 },
    [pscustomobject]@{ foo = 'HELLO'; bar = 'World'; baz = 123 }
) # false

$sensitiveComparison.Equals(
    [pscustomobject]@{ foo = 'HELLO'; bar = 'World'; baz = 123 },
    [pscustomobject]@{ foo = 'HELLO'; bar = 'World'; baz = 123 }
) # true
like image 29
Santiago Squarzon Avatar answered Jan 26 '23 22:01

Santiago Squarzon