I have a hash that uses array as its key. When I change the array, the hash can no longer get the corresponding key and value:
1.9.3p194 :016 > a = [1, 2]
=> [1, 2]
1.9.3p194 :017 > b = { a => 1 }
=> {[1, 2]=>1}
1.9.3p194 :018 > b[a]
=> 1
1.9.3p194 :019 > a.delete_at(1)
=> 2
1.9.3p194 :020 > a
=> [1]
1.9.3p194 :021 > b
=> {[1]=>1}
1.9.3p194 :022 > b[a]
=> nil
1.9.3p194 :023 > b.keys.include? a
=> true
What am I doing wrong?
Update: OK. Use a.clone is absolutely one way to deal with this problem. What if I want to change "a" but still use "a" to retrieve the corresponding value (since "a" is still one of the keys) ?
A Hash is a dictionary-like collection of unique keys and their values. Also called associative arrays, they are similar to Arrays, but where an Array uses integers as its index, a Hash allows you to use any object type. Hashes enumerate their values in the order that the corresponding keys were inserted.
In Ruby, a hash is a collection of key-value pairs. A hash is denoted by a set of curly braces ( {} ) which contains key-value pairs separated by commas. Each value is assigned to a key using a hash rocket ( => ). Calling the hash followed by a key name within brackets grabs the value associated with that key.
Creating an array of hashes You are allowed to create an array of hashes either by simply initializing array with hashes or by using array. push() to push hashes inside the array. Note: Both “Key” and :Key acts as a key in a hash in ruby.
Ruby's arrays and hashes are indexed collections. Both store collections of objects, accessible using a key. With arrays, the key is an integer, whereas hashes support any object as a key. Both arrays and hashes grow as needed to hold new elements.
The #rehash method will recalculate the hash, so after the key changes do:
b.rehash
TL;DR: consider Hash#compare_by_indentity
By default arrays .hash
and .eql?
by value, which is why changing the value confuses ruby. Consider this variant of your example:
pry(main)> a = [1, 2]
pry(main)> a1 = [1]
pry(main)> a.hash
=> 4266217476190334055
pry(main)> a1.hash
=> -2618378812721208248
pry(main)> h = {a => '12', a1 => '1'}
=> {[1, 2]=>"12", [1]=>"1"}
pry(main)> h[a]
=> "12"
pry(main)> a.delete_at(1)
pry(main)> a
=> [1]
pry(main)> a == a1
=> true
pry(main)> a.hash
=> -2618378812721208248
pry(main)> h[a]
=> "1"
See what happened there?
As you discovered, it fails to match on the a
key because the .hash
value under which it stored it is outdated [BTW, you can't even rely on that! A mutation might result in same hash (rare) or different hash that lands in the same bucket (not so rare).]
But instead of failing by returning nil
, it matched on the a1
key.
See, h[a]
doesn't care at all about the identity of a
vs a1
(the traitor!). It compared the current value you supply — [1]
with the value of a1
being [1]
and found a match.
That's why using .rehash
is just band-aid. It will recompute the .hash
values for all keys and move them to the correct buckets, but it's error-prone, and may also cause trouble:
pry(main)> h.rehash
=> {[1]=>"1"}
pry(main)> h
=> {[1]=>"1"}
Oh oh. The two entries collapsed into one, since they now have the same value (and which wins is hard to predict).
One sane approach is embracing lookup by value, which requires the value to never change. .freeze
your keys. Or use .clone
/.dup
when building the hash, and feel free to mutate the original arrays — but accept that h[a]
will lookup the current value of a
against the values preserved from build time.
The other, which you seem to want, is deciding you care about identity — lookup by a
should find a
whatever its current value, and it shouldn't matter if many keys had or now have the same value.
How?
Object
hashes by identity. (Arrays don't because types that .==
by value tend to also override .hash
and .eql?
to be by value.) So one option is: don't use arrays as keys, use some custom class (which may hold an array inside).
But what if you want it to behave directly like a hash of arrays? You could subclass Hash, or Array but it's a lot of work to make everything work consistently. Luckily, Ruby has a builtin way: h.compare_by_identity
switches a hash to work by identity (with no way to undo, AFAICT). If you do this before you insert anything, you can even have distinct keys with equal values, with no confusion:
[39] pry(main)> x = [1]
=> [1]
[40] pry(main)> y = [1]
=> [1]
[41] pry(main)> h = Hash.new.compare_by_identity
=> {}
[42] pry(main)> h[x] = 'x'
=> "x"
[44] pry(main)> h[y] = 'y'
=> "y"
[45] pry(main)> h
=> {[1]=>"x", [1]=>"y"}
[46] pry(main)> x.push(7)
=> [1, 7]
[47] pry(main)> y.push(7)
=> [1, 7]
[48] pry(main)> h
=> {[1, 7]=>"x", [1, 7]=>"y"}
[49] pry(main)> h[x]
=> "x"
[50] pry(main)> h[y]
=> "y"
Beware that such hashes are counter-intuitive if you try to put there e.g. strings, because we're really used to strings hashing by value.
Hashes use their key objects' hash codes (a.hash
) to group them. Hash codes often depend on the state of the object; in this case, the hash code of a
changes when an element has been removed from the array. Since the key has already been inserted into the hash, a
is filed under its original hash code.
This means you can't retrieve the value for a
in b
, even though it looks alright when you print the hash.
You should use a.clone
as key
irb --> a = [1, 2]
==> [1, 2]
irb --> b = { a.clone => 1 }
==> {[1, 2]=>1}
irb --> b[a]
==> 1
irb --> a.delete_at(1)
==> 2
irb --> a
==> [1]
irb --> b
==> {[1, 2]=>1} # STILL UNCHANGED
irb --> b[a]
==> nil # Trivial, since a has changed
irb --> b.keys.include? a
==> false # Trivial, since a has changed
Using a.clone
will make sure that the key is unchanged even when we change a
later on.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With