How do I adapt a search tree to handle limited regular expressions?
Given a file name, I need to find all nodes matching that file name. Nodes may contain usual file name globs (* and ?). Since this is a search tree, speed is of the essence.
I should add that the most important case for speed is the average time to rule out a match. In most cases matching will fail.
If the tree contained the following nodes:
foo, bar, foo*, *bar, foo?bar
Python glob() Method to Search Files We need to import Python's built-in glob module to use the glob() function. Python glob. glob() method returns a list of files or folders that matches the path specified in the pathname argument. This function takes two arguments, namely pathname, and recursive flag.
We can use the function glob. glob() or glob. iglob() directly from glob module to retrieve paths recursively from inside the directories/files and subdirectories/subfiles. Note: When recursive is set True “ ** ” followed by path separator ('./**/') will match any files or directories.
glob (short for global) is used to return all file paths that match a specific pattern. We can use glob to search for a specific file pattern, or perhaps more usefully, search for files where the filename matches a certain pattern by using wildcard characters.
The glob() function searches for all the pathnames matching pattern according to the rules used by the libc glob() function, which is similar to the rules used by common shells.
An Aho-Corasick search tree would fit the bill. "Tries" is a very good article about this sort of thing and the Etrie implementation used in Evolution to replace regex searching.
To do the whole string matching, you can add beginning and ending anchor states. If scanning multi-line data, you could add the newline to the begin and end. You could also remove the part where it adds the cross linking for the partial matching starting a different match. This also allows faster exclusion.
Another algorithm for checking for membership in a string set is CritBit. This doesn't have Regex, but it is simple and tests complete strings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With