Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speed of many regular expressions in python

I'm writing a python program that deals with a fair amount of strings/files. My problem is that I'm going to be presented with a fairly short piece of text, and I'm going to need to search it for instances of a fairly broad range of words/phrases.

I'm thinking I'll need to compile regular expressions as a way of matching these words/phrases in the text. My concern, however, is that this will take a lot of time.

My question is how fast is the process of repeatedly compiling regular expressions, and then searching through a small body of text to find matches? Would I be better off using some string method?

Edit: So, I guess an example of my question would be: How expensive would it be to compile and search with one regular expression versus say, iterating 'if "word" in string' say, 5 times?

like image 906
Wilduck Avatar asked Nov 30 '22 19:11

Wilduck


2 Answers

You should try to compile all your regexps into a single one using the | operator. That way, the regexp engine will do most of the optimizations for you. Use the grouping operator () to determine which regexp matched.

like image 54
Aaron Digulla Avatar answered Dec 06 '22 09:12

Aaron Digulla


If speed is of the essence, you are better off running some tests before you decide how to code your production application.

First of all, you said that you are searching for words which suggests that you may be able to do this using split() to break up the string on whitespace. And then use simple string comparisons to do your search.

Definitely do compile your regular expressions and do a timing test comparing that with the plain string functions. Check the documentation for the string class for a full list.

like image 35
Michael Dillon Avatar answered Dec 06 '22 09:12

Michael Dillon