Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is Regex faster than array comparison in this case?

Say I have an incoming string that I want scan to see if it contains any of the words I have chosen to be "bad." :)

Is it faster to split the string into an array, as well as keep the bad words in an array, and then iterate through each bad word as well as each incoming word and see if there's a match, kind of like:

badwords.each do |badword|
 incoming.each do |word|
  trigger = true if badword == word
 end
end

OR is it faster to do this:

incoming.each do |word|
 trigger = true if badwords.include? word
end

OR is it faster to leave the string as it is and run a .match() with a regex that looks something like:

/\bbadword1\b|\bbadword2\b|\bbadword3\b/

Or is the performance difference almost completely negligible? Been wondering this for a while.

like image 676
dsp_099 Avatar asked Dec 11 '22 22:12

dsp_099


2 Answers

You're giving the regex an advantage by not stopping your loop when it finds a match. Try:

incoming.find{|word| badwords.include? word}

My money is still on the regex though which should be simplified to:

/\b(badword1|badword2|badword3)\b/

or to make it a fair fight:

/\a(badword1|badword2|badword3)\z/
like image 140
pguardiario Avatar answered Dec 31 '22 09:12

pguardiario


Once it is compiled, the Regex is the fastest in real live (i.e. really long incoming string, many similar bad words, etc.) since it can run on incoming in situ and will handle overlapping parts of your "bad words" really well.

like image 33
Tilo Avatar answered Dec 31 '22 07:12

Tilo