I am trying to create a Python function that can take an plain English description of a regular expression and return the regular expression to the caller. Currently I am thinking of the description in YAML format. So, we can store the description as a raw string variable, which is passed on to this another function and output of that function is then passed to the 're' module. Following is a rather simplistic example: <pre class="prettyprint"><code># a(b|c)d+e* re1 = """ - literal: 'a' - one_of: 'b,c' - one_or_more_of: 'd' - zero_or_more_of: 'e' """ myre = re.compile(getRegex(re1)) myre.search(...) </code></pre> etc. Does anyone think something of this sort would be of wider use? Do you know already existing packages that can do it? What are the limitations that you see to this approach? Does anyone think, having the declarative string in code, would make it more maintainable?

This is actually pretty similar (identical?) to how a lexer/parser works. If you had a defined grammar then you could probably write a parser with not too much trouble. For instance, you could write something like this: <pre class="prettyprint"><code><expression> :: == <rule> | <rule> <expression> | <rule> " followed by " <expression> <rule> :: == <val> | <qty> <val> <qty> :: == "literal" | "one" | "one of" | "one or more of" | "zero or more of" <val> :: == "a" | "b" | "c" | "d" | ... | "Z" | </code></pre> That's nowhere near a perfect description. For more info, take a look at this BNF of the regex language. You could then look at lexing and parsing the expression. If you did it this way you could probably get a little closer to Natural Language/English versions of regexes. <hr> I can see a tool like this being useful, but as was previously said, mainly for beginners. The main limitation to this approach would be in the amount of code you have to write to translate the language into regex (and/or vice versa). On the other hand, I think a two-way translation tool would actually be more ideal and see more use. Being able to take a regex and turn it into English might be a lot more helpful to spot errors. Of course it doesn't take too long to pickup regex as the syntax is usually terse and most of the meanings are pretty self explanatory, at least if you use | or || as <code>OR</code> in your language, and you think of * as multiplying by 0-N, + as adding 0-N. Though sometimes I wouldn't mind typing "find one or more 'a' followed by three digits or 'b' then 'c'"

Please take a look at pyparsing. Many of the issues that you describe with RE's are the same ones that inspired me to write that package. Here are some specific features of pyparsing from the O'Reilly e-book chapter "What's so special about pyparsing?".

is there need for a more declarative way of expressing regular expressions ? :)

Tags:

python

regex

I am trying to create a Python function that can take an plain English description of a regular expression and return the regular expression to the caller.

Currently I am thinking of the description in YAML format. So, we can store the description as a raw string variable, which is passed on to this another function and output of that function is then passed to the 're' module. Following is a rather simplistic example:

# a(b|c)d+e*
re1 = """
- literal: 'a'
- one_of: 'b,c'
- one_or_more_of: 'd'
- zero_or_more_of: 'e'
"""
myre = re.compile(getRegex(re1))
myre.search(...)

etc.

Does anyone think something of this sort would be of wider use? Do you know already existing packages that can do it? What are the limitations that you see to this approach? Does anyone think, having the declarative string in code, would make it more maintainable?

338

asked Aug 09 '10 11:08

Vishal

2 Answers

This is actually pretty similar (identical?) to how a lexer/parser works. If you had a defined grammar then you could probably write a parser with not too much trouble. For instance, you could write something like this:

<expression> :: == <rule> | <rule> <expression> | <rule> " followed by " <expression>
<rule>       :: == <val> | <qty> <val>
<qty>        :: == "literal" | "one" | "one of" | "one or more of" | "zero or more of"
<val>        :: == "a" | "b" | "c" | "d" | ... | "Z" |

That's nowhere near a perfect description. For more info, take a look at this BNF of the regex language. You could then look at lexing and parsing the expression.

If you did it this way you could probably get a little closer to Natural Language/English versions of regexes.

I can see a tool like this being useful, but as was previously said, mainly for beginners. The main limitation to this approach would be in the amount of code you have to write to translate the language into regex (and/or vice versa). On the other hand, I think a two-way translation tool would actually be more ideal and see more use. Being able to take a regex and turn it into English might be a lot more helpful to spot errors.

Of course it doesn't take too long to pickup regex as the syntax is usually terse and most of the meanings are pretty self explanatory, at least if you use | or || as OR in your language, and you think of * as multiplying by 0-N, + as adding 0-N.

Though sometimes I wouldn't mind typing "find one or more 'a' followed by three digits or 'b' then 'c'"

138

answered Sep 20 '22 11:09

Wayne Werner

Please take a look at pyparsing. Many of the issues that you describe with RE's are the same ones that inspired me to write that package.

Here are some specific features of pyparsing from the O'Reilly e-book chapter "What's so special about pyparsing?".

answered Sep 16 '22 11:09

PaulMcG

Related questions
                            
                                Making Pandas work with Pendulum
                            
                                Python 3.6 glob include hidden files and folders
                            
                                How to make a generator callable?
                            
                                Rotate interactively a 3D plot in python - matplotlib - Jupyter Notebook
                            
                                Why do I need to include sub-packages in setup.py
                            
                                Store static files on S3 but staticfiles.json manifest locally
                            
                                How to convert SVG to PNG or JPEG in Python?
                            
                                How do you send arguments to a generator function using tf.data.Dataset.from_generator()?
                            
                                Getting the difference between 2 lists that contain dictionaries [duplicate]
                            
                                Setting flag column depending on whether column contains a given string
                            
                                re.sub(".*", ", "(replacement)", "text") doubles replacement on Python 3.7
                            
                                converting list of tensors to tensors pytorch
                            
                                Create an abstract Enum class
                            
                                Check if all sides of a multidimensional numpy array are arrays of zeros
                            
                                How do I reorganize a list back with the help of a dict
                            
                                How do I increase the line thickness of my Seaborn Line
                            
                                Twisted and p2p applications
                            
                                Python library for XSS filtering? [closed]
                            
                                python can't remove a file after closing it, "being used by another process"
                            
                                How to extend the comments framework (django) by removing unnecessary fields?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With