Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby hash default value behavior

Tags:

ruby

hash

I'm going through Ruby Koans, and I hit #41 which I believe is this:

def test_default_value_is_the_same_object   hash = Hash.new([])    hash[:one] << "uno"   hash[:two] << "dos"    assert_equal ["uno","dos"], hash[:one]   assert_equal ["uno","dos"], hash[:two]   assert_equal ["uno","dos"], hash[:three]    assert_equal true, hash[:one].object_id == hash[:two].object_id end 

It could not understand the behavior so I Googled it and found Strange ruby behavior when using Hash default value, e.g. Hash.new([]) that answered the question nicely.

So I understand how that works, my question is, why does a default value such as an integer that gets incremented not get changed during use? For example:

puts "Text please: " text = gets.chomp  words = text.split(" ") frequencies = Hash.new(0) words.each { |word| frequencies[word] += 1 } 

This will take user input and count the number of times each word is used, it works because the default value of 0 is always used.

I have a feeling it has to do with the << operator but I'd love an explanation.

like image 832
Jake Sellers Avatar asked Apr 23 '13 01:04

Jake Sellers


People also ask

How do you initialize a hash in Ruby?

Another initialization method is to pass Hash. new a block, which is invoked each time a value is requested for a key that has no value. This allows you to use a distinct value for each key. The block is passed two arguments: the hash being asked for a value, and the key used.

How do I get the hash value in Ruby?

Convert the key from a string to a symbol, and do a lookup in the hash. Rails uses this class called HashWithIndifferentAccess that proves to be very useful in such cases.

What does hash do in Ruby?

A Hash is a dictionary-like collection of unique keys and their values. Also called associative arrays, they are similar to Arrays, but where an Array uses integers as its index, a Hash allows you to use any object type. Hashes enumerate their values in the order that the corresponding keys were inserted.


2 Answers

The other answers seem to indicate that the difference in behavior is due to Integers being immutable and Arrays being mutable. But that is misleading. The difference is not that the creator of Ruby decided to make one immutable and the other mutable. The difference is that you, the programmer decided to mutate one but not the other.

The question is not whether Arrays are mutable, the question is whether you mutate it.

You can get both the behaviors you see above, just by using Arrays. Observe:

One default Array with mutation

hsh = Hash.new([])  hsh[:one] << 'one' hsh[:two] << 'two'  hsh[:nonexistent] # => ['one', 'two'] # Because we mutated the default value, nonexistent keys return the changed value  hsh # => {} # But we never mutated the hash itself, therefore it is still empty! 

One default Array without mutation

hsh = Hash.new([])  hsh[:one] += ['one'] hsh[:two] += ['two'] # This is syntactic sugar for hsh[:two] = hsh[:two] + ['two']  hsh[:nonexistant] # => [] # We didn't mutate the default value, it is still an empty array  hsh # => { :one => ['one'], :two => ['two'] } # This time, we *did* mutate the hash. 

A new, different Array every time with mutation

hsh = Hash.new { [] } # This time, instead of a default *value*, we use a default *block*  hsh[:one] << 'one' hsh[:two] << 'two'  hsh[:nonexistent] # => [] # We *did* mutate the default value, but it was a fresh one every time.  hsh # => {} # But we never mutated the hash itself, therefore it is still empty!   hsh = Hash.new {|hsh, key| hsh[key] = [] } # This time, instead of a default *value*, we use a default *block* # And the block not only *returns* the default value, it also *assigns* it  hsh[:one] << 'one' hsh[:two] << 'two'  hsh[:nonexistent] # => [] # We *did* mutate the default value, but it was a fresh one every time.  hsh # => { :one => ['one'], :two => ['two'], :nonexistent => [] } 
like image 110
Jörg W Mittag Avatar answered Sep 25 '22 00:09

Jörg W Mittag


It is because Array in Ruby is mutable object, so you can change it internal state, but Fixnum isn't mutable. So when you increment value using += internally it get that (assume that i is our reference to Fixnum object):

  1. get object referenced by i
  2. get it internal value (lets name it raw_tmp)
  3. create new object that internal value is raw_tmp + 1
  4. assign reference to created object to i

So as you can see, we created new object, and i reference now to something different than at the beginning.

In the other hand, when we use Array#<< it works that way:

  1. get object referenced by arr
  2. to it's internal state append given element

So as you can see it is much simpler, but it can cause some bugs. One of them you have in your question, another one is thread race when booth are trying simultaneously append 2 or more elements. Sometimes you can end with only some of them and with thrashes in memory, when you use += on arrays too, you will get rid of both of these problems (or at least minimise impact).

like image 31
Hauleth Avatar answered Sep 24 '22 00:09

Hauleth