I'm using Rails 3.2.3 with Ruby 1.9.3p0.
I'm finding that I often have to determine whether a string occurs in a list of options. It seems I can use the Ruby array .include
method:
<% if ['todo','pending','history'].include?(params[:category]) %>
or the regular expression equals-tilde match shorthand with vertical bars separating options:
<% if params[:category] =~ /todo|pending|history/ %>
Is one better than the other in terms of performance?
Is there an even better approach?
Summary: Array#include?
with String
elements wins for both accepted and rejected inputs, for your example with only three acceptable values. For a larger set to check, it looks like Set#include?
with String
elements might win.
How to Test
We should test this is empirically.
Here are a couple of alternatives that you may want to consider as well: pre-compiled regex, a list of symbols, and a Set
with String
elements.
I would imagine that the performance may also depend on whether most of your inputs fall into the expected set, and are accepted, or whether most are outside the set, and are rejected.
Here's an empirical test script:
require 'benchmark'
require 'set'
strings = ['todo','pending','history']
string_set = Set.new(strings)
symbols = strings.map(&:to_sym)
regex_compiled = Regexp.new(strings.join("|"))
strings_avg_size = (strings.map(&:size).inject {|sum, n| sum + n}.to_f / strings.size).to_i
num_inputs = 1_000_000
accepted_inputs = (0...num_inputs).map { strings[rand(strings.size)] }
rejected_inputs = (0...num_inputs).map { (0..strings_avg_size).map { ('a'...'z').to_a[rand(26)] }.join }
Benchmark.bmbm(40) do |x|
x.report("Array#include?, Strings, accepted:") { accepted_inputs.map {|s| strings.include?(s) } }
x.report("Array#include?, Strings, rejected:") { rejected_inputs.map {|s| strings.include?(s) } }
x.report("Array#include?, Symbols, accepted:") { accepted_inputs.map {|s| symbols.include?(s.to_sym) } }
x.report("Array#include?, Symbols, rejected:") { rejected_inputs.map {|s| symbols.include?(s.to_sym) } }
x.report("Set#include?, Strings, accepted:") { accepted_inputs.map {|s| string_set.include?(s) } }
x.report("Set#include?, Strings, rejected:") { rejected_inputs.map {|s| string_set.include?(s) } }
x.report("Regexp#match, interpreted, accepted:") { accepted_inputs.map {|s| s =~ /todo|pending|history/ } }
x.report("Regexp#match, interpreted, rejected:") { rejected_inputs.map {|s| s =~ /todo|pending|history/ } }
x.report("Regexp#match, compiled, accepted:") { accepted_inputs.map {|s| regex_compiled.match(s) } }
x.report("Regexp#match, compiled, rejected:") { rejected_inputs.map {|s| regex_compiled.match(s) } }
end
Results
Rehearsal ---------------------------------------------------------------------------
Array#include?, Strings, accepted: 0.210000 0.000000 0.210000 ( 0.215099)
Array#include?, Strings, rejected: 0.530000 0.010000 0.540000 ( 0.543898)
Array#include?, Symbols, accepted: 0.330000 0.000000 0.330000 ( 0.337767)
Array#include?, Symbols, rejected: 1.870000 0.050000 1.920000 ( 1.923155)
Set#include?, Strings, accepted: 0.270000 0.000000 0.270000 ( 0.274774)
Set#include?, Strings, rejected: 0.460000 0.000000 0.460000 ( 0.463925)
Regexp#match, interpreted, accepted: 0.380000 0.000000 0.380000 ( 0.382060)
Regexp#match, interpreted, rejected: 0.650000 0.000000 0.650000 ( 0.660775)
Regexp#match, compiled, accepted: 1.130000 0.080000 1.210000 ( 1.220970)
Regexp#match, compiled, rejected: 0.630000 0.000000 0.630000 ( 0.640721)
------------------------------------------------------------------ total: 6.600000sec
user system total real
Array#include?, Strings, accepted: 0.210000 0.000000 0.210000 ( 0.219060)
Array#include?, Strings, rejected: 0.430000 0.000000 0.430000 ( 0.444911)
Array#include?, Symbols, accepted: 0.340000 0.000000 0.340000 ( 0.341970)
Array#include?, Symbols, rejected: 1.080000 0.000000 1.080000 ( 1.089961)
Set#include?, Strings, accepted: 0.270000 0.000000 0.270000 ( 0.281270)
Set#include?, Strings, rejected: 0.400000 0.000000 0.400000 ( 0.406181)
Regexp#match, interpreted, accepted: 0.370000 0.000000 0.370000 ( 0.366931)
Regexp#match, interpreted, rejected: 0.560000 0.000000 0.560000 ( 0.558652)
Regexp#match, compiled, accepted: 0.920000 0.000000 0.920000 ( 0.915914)
Regexp#match, compiled, rejected: 0.620000 0.000000 0.620000 ( 0.627620)
Conclusions
(see the Summary above)
It makes sense to me, upon reflection, that the Array of Symbols would be very slow for rejected inputs, because every single one of those random strings must be interned in the Symbol table before the check can be made.
It makes less sense to me, even after pondering, that the compiled Regexp would perform so badly, especially compared to the Regexp interpreted as a literal in the code. Can anyone explain why it does so badly?
@ms-tg's answer has good benchmarks, and answers your question well as far as I'm concerned. I just wanted to add a small note: be careful with this because these two options won't always have the same results:
params = Hash.new
keyword_array = ['todo','pending','history']
included = nil
params[:category] = "history plus other text"
start_time = Time.now
1000.times do
included = keyword_array.include?(params[:category])
end
puts "Array.include? returned #{included} in #{(Time.now - start_time)*1000}ms"
start_time = Time.now
1000.times do
included = (params[:category] =~ /todo|pending|history/).is_a?(Integer)
end
puts "Regexp returned #{included} in #{(Time.now - start_time)*1000}ms"
Returns:
Array.include? returned false in 0.477ms
Regexp returned true in 0.953ms
Notice that the regular expression returned true in this case, but the array.include? returned false. This should be considered when building your logic.
Basically, if the string isn't in the array exactly, array.include? will be false, but if one of the keywords is anywhere in the string the regular expression will be true (regardless of whether there's other text).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With