Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a Python equivalent for Perl's `study`?

From Perl's documentation:

study takes extra time to study SCALAR ($_ if unspecified) in anticipation of doing many pattern matches on the string before it is next modified. This may or may not save time, depending on the nature and number of patterns you are searching and the distribution of character frequencies in the string to be searched;

I'm trying to speed up some regular expression-driven parsing that I'm doing in Python, and I remembered this trick from Perl. I realize I'll have to benchmark to determine if there is a speedup, but I can't find an equivalent method in Python.

like image 643
bonsaiviking Avatar asked Mar 05 '12 21:03

bonsaiviking


2 Answers

Perl’s study doesn’t really do much anymore. The regex compiled has gotten a whole, whole lot smarter than it was when study was created.

For example, it compiles alternatives into a trie structure with Aho–Corasick prediction.

Run with perl -Mre=debug to see the sorts of cleverness the regex compiler and execution engine apply.

like image 137
tchrist Avatar answered Nov 06 '22 14:11

tchrist


As far as I know there's nothing like this built into Python. But according to the perldoc:

The way study works is this: a linked list of every character in the string to be searched is made, so we know, for example, where all the 'k' characters are. From each search string, the rarest character is selected, based on some static frequency tables constructed from some C programs and English text. Only those places that contain this "rarest" character are examined.

This doesn't sound very sophisticated, and you could probably hack together something equivalent yourself.

esmre is kind of vaguely similar. And as @Frg noted, you'll want to use re.compile if you're reusing a single regex (to avoid re-parsing the regex itself over and over).

Or you could use suffix trees (here's one implementation, or here's a C extension with unicode support) or suffix arrays (implementation).

like image 41
Danica Avatar answered Nov 06 '22 16:11

Danica