Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to use re2 from Python?

Tags:

i just discovered http://code.google.com/p/re2, a promising library that uses a long-neglected way (Thompson NFA) to implement a regular expression engine that can be orders of magnitudes faster than the available engines of awk, Perl, or Python.

so i downloaded the code and did the usual sudo make install thing. however, that action had seemingly done little more than adding /usr/local/include/re2/re2.h to my system. there seemed to be some *.a file in addition, but then what is it with this *.a extension?

i would like to use re2 from Python (preferrably Python 3.1) and was excited to see files like make_unicode_groups.py in the distro (maybe just used during the build process?). those however were not deployed on my machine.

how can i use re2 from Python?


update two friendly people have pointed out that i could try to build DLLs / *.so files from the sources and then use Python’s ctypes library to access those. can anyone give useful pointers how to do just that? i’m pretty much clueless here, especially with the first part (building the *.so files).


update i have also posted this question (earlier) to the re2 developers’ group, without reply till now (it is a small group), and today to the (somewhat more populous) comp.lang.py group [—thread here—]. the hope is that people from various corners can contact each other. my guess is a skilled person can do this in a few hours during their 20% your-free-time-belongs-google-too timeslice; it would tie me for weeks. is there a tool to automatically dumb-down C++ to whatever flavor of C that Python needs to be able to connect? then maybe getting a viable result can be reduced to clever tool chaining.

(rant)why is this so difficult? to think that in 2010 we still cannot have our abundant pieces of software just talk to each other. this is such a roadblock that whenever you want to address some C code from Python you must always cruft these linking bits. this requires a lot of work, but only delivers an extension module that is specific to the version of the C code and the version of Python, so it ages fast.(/rant) would it be possible to run such things in separate processes (say if i had an re2 executable that can produce results for data that comes in on, say, subprocess/Popen/communicate())? (this should not be a pure command-line tool that necessitates the opening of a process each time it is needed, but a single processs that runs continuously; maybe there exist wrappers that sort of ‘demonize’ such C code).

like image 793
flow Avatar asked Mar 13 '10 17:03

flow


People also ask

What is RE2 regex?

RE2 is a software library for regular expressions via a finite-state machine using automata theory, in contrast to almost all other regular expression libraries, which use backtracking implementations. It provides a C++ interface.

Does go use RE2?

No. There's lots of things that RE2 does that Go's regexp package does not do.


2 Answers

David Reiss has put together a Python wrapper for re2. It doesn't have all of the functionality of Python's re module, but it's a start. It's available here: http://github.com/facebook/pyre2.

like image 85
Daniel Stutzbach Avatar answered Sep 25 '22 11:09

Daniel Stutzbach


Possible yes, easy no. Looking at the re2.h, this is a C++ library exposed as a class. There are two ways you could use it from Python.

1.) As Tuomas says, compile it as a DLL/so and use ctypes. In order to use it from python, though, you would need to wrap the object init and methods into c style externed functions. I've done this in the past with ctypes by externing functions that pass a pointer to the object around. The "init" function returns a void pointer to the object that gets passed on each subsequent method call. Very messy indeed.

2.) Wrap it into a true python module. Again those functions exposed to python would need to be extern "C". One option is to use Boost.Python, that would ease this work.

like image 45
Mark Avatar answered Sep 21 '22 11:09

Mark