Say I have an incoming string that I want scan to see if it contains any of the words I have chosen to be "bad." :)
Is it faster to split the string into an array, as well as keep the bad words in an array, and then iterate through each bad word as well as each incoming word and see if there's a match, kind of like:
badwords.each do |badword|
incoming.each do |word|
trigger = true if badword == word
end
end
OR is it faster to do this:
incoming.each do |word|
trigger = true if badwords.include? word
end
OR is it faster to leave the string as it is and run a .match() with a regex that looks something like:
/\bbadword1\b|\bbadword2\b|\bbadword3\b/
Or is the performance difference almost completely negligible? Been wondering this for a while.
You're giving the regex an advantage by not stopping your loop when it finds a match. Try:
incoming.find{|word| badwords.include? word}
My money is still on the regex though which should be simplified to:
/\b(badword1|badword2|badword3)\b/
or to make it a fair fight:
/\a(badword1|badword2|badword3)\z/
Once it is compiled, the Regex is the fastest in real live (i.e. really long incoming string, many similar bad words, etc.) since it can run on incoming
in situ and will handle overlapping parts of your "bad words" really well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With