Producing all possible matches of a regular expression

Tags:

Given a regular expression, I want to produce the set of strings that that regular expression would match. It is important to note that this set would not be infinite because there would be maximum length for each string. Are there any well known algorithms in place to do this? Are there any research papers I could read to gain insight into this problem?

Thanks.

p.s. Would this sort of question be more appropriate in the theoretical cs stack exchange?

484

asked Jul 10 '11 21:07

Sam

1 Answers

Are there any well known algorithms in place to do this?

In the Perl eco-system the Regexp::Genex CPAN module does this.

In Python the sre_yield generates the matching words. Regex inverter also does this.

A recursive algorithm is described here link1 link2 and several libraries that do this in Java are mentioned here.

Generation of random words/strings that match a given regex: xeger (Python)

Are there any research papers I could read to gain insight into this problem?

Yes, the following papers are available for counting the strings that would match a regex (or obtaining generating functions for them):

Counting occurrences for a finite set of words: an inclusion-exclusion approach by F. Bassino, J. Clement2, J. Fayolle, and P. Nicodeme (2007) paper slides
Regexpcount, a symbolic package for counting problems on regular expressions and words by Pierre Nicodeme (2003) paper link link code

163

answered Sep 21 '22 16:09

wsdookadr

Related questions
                            
                                Which regex flavors support captures (as opposed to capturing groups)?
                            
                                Regex - how to match everything except a particular pattern
                            
                                Regex to GENERATE thumbnails!?!?! (but that's crazy!)
                            
                                Mercurial .hgignore regular expression
                            
                                Regexp matching in pig
                            
                                Regex headache
                            
                                Any way to improve this regular expression?
                            
                                Determine if string is even or odd length with regular expression
                            
                                How to approximate Java's Character.isLetterOrDigit() to identify non-English letters, digits in Javascript?
                            
                                PHP preg_replace
                            
                                Regex C# problem
                            
                                C++ TR1 regex - multiline option
                            
                                PHP Mod_rewrite and URL-encoded symbols - only can use either of them but not both?
                            
                                Does using Pattern.LITERAL mean the same as Pattern.quote?
                            
                                Python Regular Expression with optional but greedy groups
                            
                                Regexp skip pattern
                            
                                Why the difference between .NET regular expressions and Visual Studio's regular expressions?
                            
                                How to define an Emacs command that uses `replace-string` for a specific string
                            
                                PHP:PCRE: How to replace repeatable char
                            
                                Help building a regex

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Producing all possible matches of a regular expression

Tags:

string

regex

algorithm

production

Sam

People also ask

1 Answers

wsdookadr

Recent Activity

Donate For Us