We have a few places in a time-critical script where we convert old IDs into strings. At the moment, we use case statements inside a function, like so:
def get_name id
case id
when 1
"one thing"
when 3
"other thing"
else
"default thing"
end
end
I'm considering replacing this with a hash lookup, like so:
NAMES = {
1 => "one thing",
3 => "other thing",
}
NAMES.default = "default thing"
It feels like it ought to be faster to use NAMES[id]
than get_name(id)
- but is it?
The case statement is a multiway branch statement just like a switch statement in other languages. It provides an easy way to forward execution to different parts of code based on the value of the expression.
Symbols are immutable; Strings aren't. Being immutable means that the values won't change. Therefore, Symbols are faster than Strings.
A couple points, first. One is that low-level language constructs like this that more-or-less do the same thing are almost never the bottleneck in any real-world application, so it's (often) foolish to focus on them. Second, as has already been mentioned, if you're really concerned about it you should benchmark it. Ruby's benchmarking and profile tools are certainly not the most advanced in the programming ecosystem, but they get the job done.
My gut instinct is that hashes are going to be faster because (again, I'm guessing) the case statement must check each condition in turn (making finding the items O(n) instead of O(1)). But let's check!
Full benchmarking code is at https://gist.github.com/25 Basically, it generates a file that defines the appropriate case/hash and then uses them. I went ahead and put the hash lookup within a method call, too, so that overhead won't be a factor, but in real life there's no reason it should be stuck inside a method.
Here's what I get. In each case, I'm doing 10,000 lookups. Time is user-time in seconds
Case statement, 10 items 0.020000
Hash lookup, 10 items 0.010000
Case statement, 100 items 0.100000
Hash lookup, 100 items 0.010000
Case statement, 1000 items 0.990000
Hash lookup, 1000 items 0.010000
So, it looks very much like the case statement is O(n) (no shocker there). Also note that 10K lookups is still only a second even in the case statement, so unless you're doing a metric butload of these lookups, you're better off focusing on the rest of your code.
Since this depends on a number of factors (how many different IDs you want to convert, how intelligently the compiler can compile the case when
statemens), my advice would be: Measure it:
Write a small test routine and convert, say, 10.000.000 ids to strings. Do this a couple of times with either implementation and compare the results. If you have no significant difference, take whatever you like more (I think, the hash solution is a bit more elegant...)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With