Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which is faster in ruby - a hash lookup or a function with a case statement?

We have a few places in a time-critical script where we convert old IDs into strings. At the moment, we use case statements inside a function, like so:

def get_name id
  case id
    when 1
      "one thing"
    when 3
      "other thing"
    else
      "default thing"
  end
end

I'm considering replacing this with a hash lookup, like so:

NAMES = {
  1 => "one thing",
  3 => "other thing",
}
NAMES.default = "default thing"

It feels like it ought to be faster to use NAMES[id] than get_name(id) - but is it?

like image 216
Simon Avatar asked Nov 14 '10 15:11

Simon


People also ask

What is a case statement in Ruby?

The case statement is a multiway branch statement just like a switch statement in other languages. It provides an easy way to forward execution to different parts of code based on the value of the expression.

Why are symbols faster than strings?

Symbols are immutable; Strings aren't. Being immutable means that the values won't change. Therefore, Symbols are faster than Strings.


2 Answers

A couple points, first. One is that low-level language constructs like this that more-or-less do the same thing are almost never the bottleneck in any real-world application, so it's (often) foolish to focus on them. Second, as has already been mentioned, if you're really concerned about it you should benchmark it. Ruby's benchmarking and profile tools are certainly not the most advanced in the programming ecosystem, but they get the job done.

My gut instinct is that hashes are going to be faster because (again, I'm guessing) the case statement must check each condition in turn (making finding the items O(n) instead of O(1)). But let's check!

Full benchmarking code is at https://gist.github.com/25 Basically, it generates a file that defines the appropriate case/hash and then uses them. I went ahead and put the hash lookup within a method call, too, so that overhead won't be a factor, but in real life there's no reason it should be stuck inside a method.

Here's what I get. In each case, I'm doing 10,000 lookups. Time is user-time in seconds

Case statement, 10 items  0.020000
Hash lookup, 10 items     0.010000

Case statement, 100 items  0.100000
Hash lookup, 100 items     0.010000

Case statement, 1000 items  0.990000
Hash lookup, 1000 items     0.010000

So, it looks very much like the case statement is O(n) (no shocker there). Also note that 10K lookups is still only a second even in the case statement, so unless you're doing a metric butload of these lookups, you're better off focusing on the rest of your code.

like image 182
Bill Dueber Avatar answered Sep 20 '22 21:09

Bill Dueber


Since this depends on a number of factors (how many different IDs you want to convert, how intelligently the compiler can compile the case when statemens), my advice would be: Measure it:

Write a small test routine and convert, say, 10.000.000 ids to strings. Do this a couple of times with either implementation and compare the results. If you have no significant difference, take whatever you like more (I think, the hash solution is a bit more elegant...)

like image 42
MartinStettner Avatar answered Sep 19 '22 21:09

MartinStettner