How to efficiently implement regular expression like .*a.*b.*?

Question

I want to matching of filenames like Colibri does. I tried to solve it by regular expressions.

Searching in Colibri works that You type characters which are in order inside of a filename and it finds all files with these characters in order in the filename. For example for "ab" it finds "cabal", "ab", and "achab".

Simple insertion of .* between letters works (so searched string "ab" becomes regular expression .*a.*b.*), but I want to make it on large amount of files.

So far I have O(N*???), where N is amount of filenames and ??? is at best linear complexity (I assume my language uses NFA). I don't care about space complexity that much. What data structures or algorithms should I choose to make it more efficient (in time complexity)?

Gumbo · Accepted Answer

If you just want to check whether the characters of a search string search are contained in another string str in the same order, you could use this simple algorithm:

pos := -1
for each character in search do
    pos := indexOf(str, character, pos+1)
    if pos is -1 then
        break
    endif
endfor
return pos

This algorithm returns the offset of the last character of search in str and -1 otherwise. Its runtime is in O(n) (you could replace indexOf by a simple while loop that compares the characters in str from pos to Length(str)-1 and returns either the offset or -1).

How to efficiently implement regular expression like .a.b.*?

Tags:

performance

regex

algorithm

user712092

1 Answers

Gumbo

Recent Activity

Donate For Us