Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use the same regular expression in different programming languages?

I've seen this question, and I know from experience that every language seems to support a different dialect of regex. I figure the problem has been around for a long time, so somebody must have wanted to do something about it.

I have a pretty big project that involves JavaScript, Ruby, and Java, and all of them have to touch the same regular expressions. We picked Java as our "official" RE interpreter, which means that any time the other two languages need to evaluate an RE, they have to somehow pass it to a Java program, and that's starting to add up to a lot of overhead.

If I could pick any RE dialect and invoke that at least semi-natively from all the languages, it'd be a huge step forward for us. Is this possible? Is it being done already? We looked at PCRE, and it's technically possible to invoke it via native bindings from Java and Ruby (though it leaves JS out in the cold), but I haven't found anybody actually doing it. Are we alone?

ETA: a wrinkle I did not mention is that this system applies user supplied regex. (Yes, I understand that this is a security issue, etc, but it's for in-house use by trusted, attributed users.) I can certainly suggest putting up a list of "don't do this" power-features to avoid, but I kind of hope it's not the best solution.

like image 374
Coderer Avatar asked Dec 21 '11 19:12

Coderer


People also ask

Is regular expressions same for all languages?

Regular expression synax varies slightly between languages but for the most part the details are the same. Some regex implementations support slightly different variations on how they process as well as what certain special character sequences mean.

How can you match regular expressions?

The fundamental building blocks of a regex are patterns that match a single character. Most characters, including all letters ( a-z and A-Z ) and digits ( 0-9 ), match itself. For example, the regex x matches substring "x" ; z matches "z" ; and 9 matches "9" .

How regular expressions are used in programming languages?

Regular Expression or (Regex) is one of the most powerful, flexible, and efficient text processing approaches. Regex has its own terminologies, conditions, and syntax; it is, in a sense, a mini programming language. Regex can be used to add, remove, isolate, and manipulate all kinds of text and data.

How can we tell if two regular expressions accept the same language?

We say that two regular expressions R and S are equivalent if they describe the same language. In other words, if L(R) = L(S) for two regular expressions R and S then R = S. Examples. Are R and S equivalent?


2 Answers

The dialects that you implicitly mentioned in your post aren't THAT much different, there are things supported by one and not by the others, but that will normally not cause any problems unless you are writing regular expressions that actually specifically target one of the dialects in question.

You can see the differences between the dialects in the table available in the following link:

  • regular-expressions.info: Compare Regular Expression Flavors

The major difference between them are the more "advanced" features of regular-expressions. If you keep away from using these, you'll be in the safe zone.


Since both python and java has modules available for executing native javascript you can say that all expressions should be written for javascript, and then make future developers use the module available to them, to make sure that the regexp ran always will operate exactly the same way.

Though I'd just document your application saying that whatever regular expressions used needs to be supported by all three languages, and then direct them to a table (such as the one previously linked) saying that they should look up what's available to use.

..or you ccould ompile a list/table of your own.

like image 106
Filip Roséen - refp Avatar answered Sep 17 '22 12:09

Filip Roséen - refp


The dialects are all slightly different, but they overlap in almost all major points. (The main differences are in not in the regexes themselves, but in how you call them (one language's find is another's matches, and so on) and in support for regex literals (one language's // is another's raw string is another's string of backslashes).)

Rather than somehow getting JavaScript to support Java peculiarities and vice versa, I think it's probably better to restrict yourselves to the huge subset of regexes that are common between all three of your languages, and to use unit-tests to ensure that your regexes behave the same in all three.

like image 22
ruakh Avatar answered Sep 19 '22 12:09

ruakh