If I want to analyze a string using dozens of regular-expressions,
could either the threading or multiprocessing module improve performance?
In other words, would analyzing the string on multiple threads or processes be faster than:
match = re.search(regex1, string)
if match:
afunction(match)
else:
match = re.search(regex2, string)
if match:
bfunction(match)
else:
match = re.search(regex3, string)
if match:
cfunction(match)
...
No more than one regular expression would ever match, so that's not a concern.
If the answer is multiprocessing, what technique would you recommend looking into (queues, pipes)?
Python threading won't improve performance because of the GIL which precludes more than one thread running at a time. If you have a multicore machine, it's possible that multiple processes may speed things up but only if the cost of spawning subprocesses and passing data around is less than the cost of performing your RE searches.
If you do this often, you might look into thread pools.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With