Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check if a keyword is in a string

Tags:

java

I have a list of keywords and I want to be able to find if a string contains any of those keywords. Right now the solution I have takes O(n). Is there a quicker way of doing this search without looping through each keywords and doing a comparison/contains?

i.e. Keywords = "cat", "hat", "mat", "bat", "fat", "sat", "rat", "pat", "foo bar", "foo-bar" String = "There is a cat in the box." The result of this is true because "cat" matches one of the words in the 'keywords'

EDIT: I guess I wasn't as clear when I said O(n). I mean to say O(n) where n=number of keywords.

like image 439
user459811 Avatar asked Dec 02 '13 03:12

user459811


People also ask

How do you check if a word is present in a string Python?

We can iteratively check for every word, but Python provides us an inbuilt function find() which checks if a substring is present in the string, which is done in one line. find() function returns -1 if it is not found, else it returns the first occurrence, so using this function this problem can be solved.

How do you check if a word is in a string Java?

You can use contains(), indexOf() and lastIndexOf() method to check if one String contains another String in Java or not. If a String contains another String then it's known as a substring. The indexOf() method accepts a String and returns the starting position of the string if it exists, otherwise, it will return -1.

How can I check keywords in Python?

To check if a string is a valid keyword, import the keyword module and use the iskeyword() method. With that, you can directly display all the keywords at once and verify.

Is string is a keyword in Python?

The class , break , and continue strings are valid Python keywords, so True is returned.

How do I check if a string is a valid keyword?

To check if a string is a valid keyword, use the IsValidIdentifier method. The IsValidIdentifier method checks whether the entered value is an identifier or not.

How to get all the keywords in Python?

Get all the keywords in python using keyword.kwlist method and store it in a variable. Give the first string as static input and store it in another variable. Give the second string as static input and store it in another variable. Check whether the given first string is present in the above keyword list or not using the if conditional statement.

What are keywords in C programming language?

Keywords are reserved words which cannot be used as variable names in program. There are 32 keywords in the C programming language. Compare the string with each keyword if the string is same then string is keyword

How do you check if a string contains a specific phrase?

Check In String. To check if a certain phrase or character is present in a string, we can use the keywords in or not in.


1 Answers

You can use Boyer-Moore, which involves preprocessing the string, but you're not going to be able to beat a worst case of O(KN), where K is the sum of the lengths of the keywords, and N is the length of the string. Best case is of course sublinear, but you can't have a worst case sublinear runtime.

Note that the comparisons aren't free. It's not like you can compare two strings in O(1) to see if they're equal, you have to iterate through the characters. Hashing gets you to what you need to compare to in constant time, but doesn't help more than that since two different strings can have the same hash. That's not to say that hashing isn't good, it is, but it does not alter the worst case runtime complexity.

In the end, you need to compare the characters, and Boyer-Moore provides a very good way to do that. Of course if you're using some sort of hash based build, you may be able to rule out certain keywords in amortized constant time, but that doesn't change the fact that in the worst case (and many other cases), you're going to need to compare characters.

Also note that depending on what we assume about the data, and how we construct our indexing structure(s), it's possible to achieve a very good actual runtime. Just because the worst case complexity isn't sublinear doesn't mean that the actual runtime won't be really fast. There is no singleton simple or correct solution, the problem can be approached in myriad ways. There's never a quick and dirty answer to solve all of your problems when it comes to information retrieval.

like image 164
Steve P. Avatar answered Oct 06 '22 06:10

Steve P.