Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How are Symbols faster than Strings in Hash lookups?

I understand one aspect of why Symbols should be used as opposed to Strings in Hashes. Namely that there is only one instance of a given Symbol in memory whereas there could be multiple instances of a given String with the same value.

What I don't understand is how Symbols are faster than Strings in a Hash lookup. I've looked at the answers here, but I still don't quite get it.

If :foo.hash == :foo.object_id returned true, then it would've made some sense because then it'd be able to use the object id as the value for the hash and wouldn't have to compute it every time. However this isn't the case and :foo.object_id is not equal to :foo.hash. Hence my confusion.

like image 584
Asad Moosvi Avatar asked Dec 13 '25 05:12

Asad Moosvi


1 Answers

There's no obligation for hash to be equivalent to object_id. Those two things serve entirely different purposes. The point of hash is to be as deterministic and yet random as possible so that the values you're inserting into your hash are evenly distributed. The point of object_id is to define a unique object identifier, though there's no requirement that these be random or evenly distributed. In fact, randomizing them is counter-productive, that'd just make things slower for no reason.

The reason symbols tend to be faster is because the memory for them is allocated once (garbage collection issues aside) and recycled for all instances of the same symbol. Strings are not like that. They can be constructed in a multitude of ways, and even two strings that are byte-for-byte identical are likely to be different objects. In fact, it's safer to presume they are than otherwise unless you know for certain they're the same object.

Now when it comes to computing hash, the value must be randomly different even if the string changes very little. Since the symbol can't change computing it can be optimized more. You could just compute a hash of the object_id since that won't change, for example, while the string needs to take into account the content of itself, which is presumably dynamic.

Try benchmarking things:

require 'benchmark'

count = 100000000

Benchmark.bm do |bm|
  bm.report('Symbol:') do
    count.times { :symbol.hash }
  end
  bm.report('String:') do
    count.times { "string".hash }
  end
end

This gives me results like this:

       user     system      total        real
Symbol:  6.340000   0.020000   6.360000 (  6.420563)
String: 11.380000   0.040000  11.420000 ( 11.454172)

Which in this most trivial case is easily 2x faster. Based on some basic testing the performance of the string code degrades O(N) as the strings get longer but the symbol times remain constant.

like image 128
tadman Avatar answered Dec 15 '25 15:12

tadman



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!