Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the fastest way to check if a word from one string is in another string?

I have a string of words; let's call them bad:

bad = "foo bar baz"

I can keep this string as a whitespace separated string, or as a list:

bad = bad.split(" ");

If I have another string, like so:

str = "This is my first foo string"

What's the fasted way to check if any word from the bad string is within my comparison string, and what's the fastest way to remove said word if it's found?

#Find if a word is there
bad.split(" ").each do |word|
  found = str.include?(word)
end

#Remove the word
bad.split(" ").each do |word|
  str.gsub!(/#{word}/, "")
end
like image 329
Mike Trpcic Avatar asked Mar 31 '10 02:03

Mike Trpcic


People also ask

How do you check if a string is in another string?

1. Using String#contains() method. The standard solution to check if a string is a substring of another string is using the String#contains() method. It returns true if the string contains the specified string, false otherwise.

How do you check if a specific word is in a string?

You can use the PHP strpos() function to check whether a string contains a specific word or not. The strpos() function returns the position of the first occurrence of a substring in a string. If the substring is not found it returns false . Also note that string positions start at 0, and not 1.

How do you check if a word is a substring of another Java?

You can use contains(), indexOf() and lastIndexOf() method to check if one String contains another String in Java or not. If a String contains another String then it's known as a substring. The indexOf() method accepts a String and returns the starting position of the string if it exists, otherwise, it will return -1.

How do you check if a string is in another string Python?

The in Operator It returns a Boolean (either True or False ). To check if a string contains a substring in Python using the in operator, we simply invoke it on the superstring: fullstring = "StackAbuse" substring = "tack" if substring in fullstring: print("Found!") else: print("Not found!")


2 Answers

If the list of bad words gets huge, a hash is a lot faster:

    require 'benchmark'

    bad = ('aaa'..'zzz').to_a    # 17576 words
    str= "What's the fasted way to check if any word from the bad string is within my "
    str += "comparison string, and what's the fastest way to remove said word if it's "
    str += "found" 
    str *= 10

    badex = /\b(#{bad.join('|')})\b/i

    bad_hash = {}
    bad.each{|w| bad_hash[w] = true}

    n = 10
    Benchmark.bm(10) do |x|

      x.report('regex:') {n.times do 
        str.gsub(badex,'').squeeze(' ')
      end}

      x.report('hash:') {n.times do
        str.gsub(/\b\w+\b/){|word| bad_hash[word] ? '': word}.squeeze(' ')
      end}

    end
                user     system      total        real
regex:     10.485000   0.000000  10.485000 ( 13.312500)
hash:       0.000000   0.000000   0.000000 (  0.000000)
like image 143
steenslag Avatar answered Sep 22 '22 23:09

steenslag


bad = "foo bar baz"

=> "foo bar baz"

str = "This is my first foo string"

=> "This is my first foo string"

(str.split(' ') - bad.split(' ')).join(' ')

=> "This is my first string"

like image 42
jeem Avatar answered Sep 23 '22 23:09

jeem