I have a string of words; let's call them bad
:
bad = "foo bar baz"
I can keep this string as a whitespace separated string, or as a list:
bad = bad.split(" ");
If I have another string, like so:
str = "This is my first foo string"
What's the fasted way to check if any word from the bad
string is within my comparison string, and what's the fastest way to remove said word if it's found?
#Find if a word is there
bad.split(" ").each do |word|
found = str.include?(word)
end
#Remove the word
bad.split(" ").each do |word|
str.gsub!(/#{word}/, "")
end
1. Using String#contains() method. The standard solution to check if a string is a substring of another string is using the String#contains() method. It returns true if the string contains the specified string, false otherwise.
You can use the PHP strpos() function to check whether a string contains a specific word or not. The strpos() function returns the position of the first occurrence of a substring in a string. If the substring is not found it returns false . Also note that string positions start at 0, and not 1.
You can use contains(), indexOf() and lastIndexOf() method to check if one String contains another String in Java or not. If a String contains another String then it's known as a substring. The indexOf() method accepts a String and returns the starting position of the string if it exists, otherwise, it will return -1.
The in Operator It returns a Boolean (either True or False ). To check if a string contains a substring in Python using the in operator, we simply invoke it on the superstring: fullstring = "StackAbuse" substring = "tack" if substring in fullstring: print("Found!") else: print("Not found!")
If the list of bad words gets huge, a hash is a lot faster:
require 'benchmark'
bad = ('aaa'..'zzz').to_a # 17576 words
str= "What's the fasted way to check if any word from the bad string is within my "
str += "comparison string, and what's the fastest way to remove said word if it's "
str += "found"
str *= 10
badex = /\b(#{bad.join('|')})\b/i
bad_hash = {}
bad.each{|w| bad_hash[w] = true}
n = 10
Benchmark.bm(10) do |x|
x.report('regex:') {n.times do
str.gsub(badex,'').squeeze(' ')
end}
x.report('hash:') {n.times do
str.gsub(/\b\w+\b/){|word| bad_hash[word] ? '': word}.squeeze(' ')
end}
end
user system total real
regex: 10.485000 0.000000 10.485000 ( 13.312500)
hash: 0.000000 0.000000 0.000000 ( 0.000000)
bad = "foo bar baz"
=> "foo bar baz"
str = "This is my first foo string"
=> "This is my first foo string"
(str.split(' ') - bad.split(' ')).join(' ')
=> "This is my first string"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With