Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to recognize a set of key words in a text

I have a huge set of key words. Given a text , I want to be able to recognize only those words that occur in the key list of words and ignore all the other words. What is the best way to approach this?

like image 918
kc3 Avatar asked Feb 24 '23 03:02

kc3


1 Answers

The Aho-Corasick algorithm is a fast algorithm for recognizing a set of pattern strings in a larger source string. It's employed by several search utilities, along with many antivirus programs, since it runs in time O(m + n + z), where n is the total size of all the pattern strings you're trying to match, m is the length of the string to search, and z is the total number of matches. Moreover, if you know in advance what strings you're searching for, you can do the O(n) work offline and reduce the search time to O(m + z).

like image 54
templatetypedef Avatar answered Feb 26 '23 17:02

templatetypedef