Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

is there need for a more declarative way of expressing regular expressions ? :)

Tags:

python

regex

I am trying to create a Python function that can take an plain English description of a regular expression and return the regular expression to the caller.

Currently I am thinking of the description in YAML format. So, we can store the description as a raw string variable, which is passed on to this another function and output of that function is then passed to the 're' module. Following is a rather simplistic example:

# a(b|c)d+e*
re1 = """
- literal: 'a'
- one_of: 'b,c'
- one_or_more_of: 'd'
- zero_or_more_of: 'e'
"""
myre = re.compile(getRegex(re1))
myre.search(...)

etc.

Does anyone think something of this sort would be of wider use? Do you know already existing packages that can do it? What are the limitations that you see to this approach? Does anyone think, having the declarative string in code, would make it more maintainable?

like image 338
Vishal Avatar asked Aug 09 '10 11:08

Vishal


People also ask

Why do we need regular expression?

Regular expressions are particularly useful for defining filters. Regular expressions contain a series of characters that define a pattern of text to be matched—to make a filter more specialized, or general.

What are different types of regular expression?

There are also two types of regular expressions: the "Basic" regular expression, and the "extended" regular expression. A few utilities like awk and egrep use the extended expression. Most use the "basic" regular expression. From now on, if I talk about a "regular expression," it describes a feature in both types.

What does * do in RegEx?

The Match-zero-or-more Operator ( * ) This operator repeats the smallest possible preceding regular expression as many times as necessary (including zero) to match the pattern. `*' represents this operator. For example, `o*' matches any string made up of zero or more `o' s.

What does in regular expression mean?

A regular expression (sometimes called a rational expression) is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. “find and replace”-like operations.


2 Answers

This is actually pretty similar (identical?) to how a lexer/parser works. If you had a defined grammar then you could probably write a parser with not too much trouble. For instance, you could write something like this:

<expression> :: == <rule> | <rule> <expression> | <rule> " followed by " <expression>
<rule>       :: == <val> | <qty> <val>
<qty>        :: == "literal" | "one" | "one of" | "one or more of" | "zero or more of"
<val>        :: == "a" | "b" | "c" | "d" | ... | "Z" | 

That's nowhere near a perfect description. For more info, take a look at this BNF of the regex language. You could then look at lexing and parsing the expression.

If you did it this way you could probably get a little closer to Natural Language/English versions of regexes.


I can see a tool like this being useful, but as was previously said, mainly for beginners. The main limitation to this approach would be in the amount of code you have to write to translate the language into regex (and/or vice versa). On the other hand, I think a two-way translation tool would actually be more ideal and see more use. Being able to take a regex and turn it into English might be a lot more helpful to spot errors.

Of course it doesn't take too long to pickup regex as the syntax is usually terse and most of the meanings are pretty self explanatory, at least if you use | or || as OR in your language, and you think of * as multiplying by 0-N, + as adding 0-N.

Though sometimes I wouldn't mind typing "find one or more 'a' followed by three digits or 'b' then 'c'"

like image 138
Wayne Werner Avatar answered Sep 20 '22 11:09

Wayne Werner


Please take a look at pyparsing. Many of the issues that you describe with RE's are the same ones that inspired me to write that package.

Here are some specific features of pyparsing from the O'Reilly e-book chapter "What's so special about pyparsing?".

like image 45
PaulMcG Avatar answered Sep 16 '22 11:09

PaulMcG