Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Set/hash with custom hashing function in Perl 6

My question is related to user defined function in set operations, but I think I can cut to the core of the problem:

How do I choose a specific hashing function? For example, if I want to do value-based matching rather than reference-matching, and I want to see whether a certain tuple is present (or simply delete it):

my %data := SetHash.new: (1, 2), (3, 4);
%data{$(1, 2)}:delete; # False

In C++ or C#, I could give a custom hashing/comparison function to the constructor. In C#, hashing by value will automatically happen if my data type is a struct (a value-type rather than a reference-type). Perl 6 does value-type hashing to some extent for Pair (if the Pair doesn't contain any containers), but I don't see how to make it work for any other complex type.

On one hand, I get why this isn't the safest operation--it's easy to define objects whose hash code can change after they've been inserted. But that hasn't stopped .NET and the C++ STL from allowing custom hashing.

A possible API usage (with the chained hashing logic inspired by this, originally from Boost) would be:

class MyHasher does Hasher of Array[Int] {
  method get-hash-value(Int @array) {
    reduce
      -> $a, $b {$a +^ ($b + 0x9e3779b97f4a7c16 + ($a +< 6) + ($a +> 2))},
      0,
      |@array;
  }
  method equals(Int @a, Int @b) { @a eqv @b; }
}

my %data := SetHash.new(
  my Int @=[1, 2], my Int @=[3, 4],
  :hasher(MyHasher.new)
);
say %data{$[1, 2]}; # should be True

And this would be the hasher role, which should be provided by the Perl 6's core library, if it doesn't already exist:

role Hasher[::T=Any] { method equals(T $a, T $b --> Bool) { ... }; method get-hash-value(T $obj) { ... } }

Solution: At present, the most reasonable solution is to override a class's .WHICH method, which serves as a hash value and is used for equality testing. I gave an example of a hash-key class which emulates a value-type here. It's almost as versatile as a custom hash function per hash object, since the key type can be declared when the hash is created. (This can't be done for Set since Set isn't parameterized.)

like image 959
piojo Avatar asked Nov 07 '22 13:11

piojo


1 Answers

The way a Hash works is you use a key to store a value, and use the exact same key to retrieve the value.

In the case of value types like Str and Int, you can have multiple instances which act as if they are the exact same value. So 42 and 40 + 2 act as if they are the exact same instance even if they aren't.

So this works:

my %h{Any}; # defaults to Str keys

%h{'abc'} = 42;

my ($a,$b,$c) = 'a', 'b', 'c';

say %h{"$a $b $c"}; # 42

%h{42} = 'The answer';

say %h{"42"}; # (Any)
say %h{42}; # The answer

There isn't really a facility to make several different values pretend to be the same value for just a Hash.

'abc' === 'cba'; # False

'abc'.WHICH eq 'cba'.WHICH; # basically how the above works

I think that what you are asking for is a feature which should not be added.

There is a WHICH method, which should only used to make two values identical everywhere throughout the language.

say 42.WHICH.perl;       # ValueObjAt.new("Int|42")
say (40 + 2).WHICH.perl; # ValueObjAt.new("Int|42")
42 === (40 + 2);         # True

say Hash.new.WHICH.perl; # ObjAt.new("Hash|94014087733456")
say Hash.new.WHICH.perl; # ObjAt.new("Hash|94014087735232")

Note that for the Hash.new that they don't match, because they are different instances that can change over time.

As an example where this is a good thing. Let's say you have two employees named 'Bob'.

my $a = Employee.new( name => 'Bob' );
my $b = Employee.new( name => 'Bob' );

my %salary{Employee};

%salary{$a} = 1200; # arbitrary number for an example
%salary{$b} = 2000;

Note that by overriding the WHICH method you could end up accidently giving Bob $a a raise.

Basically it is probably not a good idea to mess with .WHICH unless you know exactly what you are doing, and you have a really good reason to.


So you can't/shouldn't do that. At least not the way you are trying to do it.

Instead make a new Associative class which works the way you want.

role Custom-Str-Hasher {
  method hashed ( --> Str:D ){…}
}

class Custom-SetHash is SetHash {
  multi method AT-KEY ( Custom-Str-Hasher:D $key ) is rw {
    self.AT-KEY( $key.hashed() ); # call base class's method
  }
}


class Foo does Custom-Str-Hasher {
  has Str:D $.Str is required;

  # required by the Custom-Str-Hasher role
  method hashed ( --> Str:D ){
    $!Str.comb(/\w/).unique.sort.join;
    # 'b cb a' → 'abc' and 'aaababcccba' → 'abc'
  }
}

my $a = Foo.new(:Str('b cb a'));
my $b = Foo.new(:Str('aaababcccba'));

my %h is Custom-SetHash; # use a different class than the default

%h{$a} = True;
say %h{$b}; # True;

put $a; # b cb a
put $b; # aaababcccba

Note that the above is just a quick example, there is a bunch of things I would change for a real example. For one, %h{'abc'} would also return True because of the way I implemented the AT-KEY method. It is also missing a bunch of methods like ASSIGN-KEY and DELETE-KEY.

like image 173
Brad Gilbert Avatar answered Nov 15 '22 08:11

Brad Gilbert