efficient algorithm for searching one of several strings in a text?

Question

I need to search incoming not-very-long pieces of text for occurrences of given strings. The strings are constant for the whole session and are not many (~10). Additional simplification is that none of the strings is contained in any other.

I am currently using boost regex matching with str1 | str2 | .... The performance of this task is important, so I wonder if I can improve it. Not that I can program better than the boost guys, but perhaps a dedicated implementation is more efficient than a general one.

As the strings stay constant over long time, I can afford building a data structure, like a state transition table, upfront.

e.g., if the strings are abcx, bcy and cz, and I've read so far abc, I should be in a combined state that means you're either 3 chars into string 1, 2 chars into string 2 or 1 char into string 1. Then reading x next will move me to the string 1 matched state etc., and any char other than xyz will move to the initial state, and I will not need to retract back to b.

Any ideas or references are appreciated.

maxschlepzig · Accepted Answer

Check out the Aho–Corasick string matching algorithm!

wilhelmtell · Answer

Take a look at Suffix Tree.

efficient algorithm for searching one of several strings in a text?

Tags:

c++

string-matching

algorithm

text-search

davka

2 Answers

maxschlepzig

wilhelmtell

Recent Activity

Donate For Us

efficient algorithm for searching one of several strings in a text?

Tags:

c++

string-matching

algorithm

text-search

davka

2 Answers

maxschlepzig

wilhelmtell

Related questions

Recent Activity

Donate For Us