Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is a string key for a hash frozen?

Tags:

According to the specification, strings that are used as a key to a hash are duplicated and frozen. Other mutable objects do not seem to have such special consideration. For example, with an array key, the following is possible.

a = [0] h = {a => :a} h.keys.first[0] = 1 h # => {[1] => :a} h[[1]] # => nil h.rehash h[[1]] # => :a 

On the other hand, a similar thing cannot be done with a string key.

s = "a" h = {s => :s} h.keys.first.upcase! # => RuntimeError: can't modify frozen String 

Why is string designed to be different from other mutable objects when it comes to a hash key? Is there any use case where this specification becomes useful? What other consequences does this specification have?


I actually have a use case where absence of such special specification about strings may be useful. That is, I read with the yaml gem a manually written YAML file that describes a hash. the keys may be strings, and I would like to allow case insensitivity in the original YAML file. When I read a file, I might get a hash like this:
h = {"foo" => :foo, "Bar" => :bar, "BAZ" => :baz} 

And I want to normalize the keys to lower case to get this:

h = {"foo" => :foo, "bar" => :bar, "baz" => :baz} 

by doing something like this:

h.keys.each(&:downcase!) 

but that returns an error for the reason explained above.

like image 668
sawa Avatar asked Oct 24 '12 07:10

sawa


People also ask

Why can you safely use a string as a hash key even though a string is mutable?

According to the specification, strings that are used as a key to a hash are duplicated and frozen. Other mutable objects do not seem to have such special consideration. For example, with an array key, the following is possible. On the other hand, a similar thing cannot be done with a string key.

How can you tell if a key is present in a hash?

We can check if a particular hash contains a particular key by using the method has_key?(key) . It returns true or false depending on whether the key exists in the hash or not.

Can you freeze a hash in Ruby?

You can always freeze your hashes in Ruby for safety if you want to. All that's missing is some sugar for declaring immutable hash literals.

What does .freeze do in Ruby?

The freeze method in Ruby is used to ensure that an object cannot be modified. This method is a great way to create immutable objects. Any attempt to modify an object that has called the freeze method will result in the program throwing a runtime error.


2 Answers

In short it's just Ruby trying to be nice.

When a key is entered in a Hash, a special number is calculated, using the hash method of the key. The Hash object uses this number to retrieve the key. For instance, if you ask what the value of h['a'] is, the Hash calls the hash method of string 'a' and checks if it has a value stored for that number. The problem arises when someone (you) mutates the string object, so the string 'a' is now something else, let's say 'aa'. The Hash would not find a hash number for 'aa'.

The most common types of keys for hashes are strings, symbols and integers. Symbols and integers are immutable, but strings are not. Ruby tries to protect you from the confusing behaviour described above by dupping and freezing string keys. I guess it's not done for other types because there could be nasty performance side effects (think of large arrays).

like image 146
steenslag Avatar answered Sep 20 '22 19:09

steenslag


Immutable keys make sense in general because their hash codes will be stable.

This is why strings are specially-converted, in this part of MRI code:

if (RHASH(hash)->ntbl->type == &identhash || rb_obj_class(key) != rb_cString) {   st_insert(RHASH(hash)->ntbl, key, val); } else {   st_insert2(RHASH(hash)->ntbl, key, val, copy_str_key); } 

In a nutshell, in the string-key case, st_insert2 is passed a pointer to a function that will trigger the dup and freeze.

So if we theoretically wanted to support immutable lists and immutable hashes as hash keys, then we could modify that code to something like this:

VALUE key_klass; key_klass = rb_obj_class(key); if (key_klass == rb_cArray || key_klass == rb_cHash) {   st_insert2(RHASH(hash)->ntbl, key, val, freeze_obj); } else if (key_klass == rb_cString) {   st_insert2(RHASH(hash)->ntbl, key, val, copy_str_key); } else {   st_insert(RHASH(hash)->ntbl, key, val); } 

Where freeze_obj would be defined as:

static st_data_t freeze_obj(st_data_t obj) {     return (st_data_t)rb_obj_freeze((VALUE) obj); } 

So that would solve the specific inconsistency that you observed, where the array-key was mutable. However to be really consistent, more types of objects would need to be made immutable as well.

Not all types, however. For example, there'd be no point to freezing immediate objects like Fixnum because there is effectively only one instance of Fixnum corresponding to each integer value. This is why only String needs to be special-cased this way, not Fixnum and Symbol.

Strings are a special exception simply as a matter of convenience for Ruby programmers, because strings are very often used as hash keys.

Conversely, the reason that other object types are not frozen like this, which admittedly leads to inconsistent behavior, is mostly a matter of convenience for Matz & Company to not support edge cases. In practice, comparatively few people will use a container object like an array or a hash as a hash key. So if you do so, it's up to you to freeze before insertion.

Note that this is not strictly about performance, because the act of freezing a non-immediate object simply involves flipping the FL_FREEZE bit on the basic.flags bitfield that's present on every object. That's of course a cheap operation.

Also speaking of performance, note that if you are going to use string keys, and you are in a performance-critical section of code, you might want to freeze your strings before doing the insertion. If you don't, then a dup is triggered, which is a more-expensive operation.

Update @sawa pointed out that leaving your array-key simply frozen means the original array might be unexpectedly immutable outside of the key-use context, which could also be an unpleasant surprise (although otoh it would serve you right for using an array as a hash-key, really). If you therefore surmise that dup + freeze is the way out of that, then you would in fact incur possible noticeable performance cost. On the third hand, leave it unfrozen altogether, and you get the OP's original weirdness. Weirdness all around. Another reason for Matz et al to defer these edge cases to the programmer.

like image 31
manzoid Avatar answered Sep 20 '22 19:09

manzoid