Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Producing all possible matches of a regular expression

Given a regular expression, I want to produce the set of strings that that regular expression would match. It is important to note that this set would not be infinite because there would be maximum length for each string. Are there any well known algorithms in place to do this? Are there any research papers I could read to gain insight into this problem?

Thanks.

p.s. Would this sort of question be more appropriate in the theoretical cs stack exchange?

like image 484
Sam Avatar asked Jul 10 '11 21:07

Sam


People also ask

How do you find the number of matches in a regular expression?

To count the number of regex matches, call the match() method on the string, passing it the regular expression as a parameter, e.g. (str. match(/[a-z]/g) || []). length . The match method returns an array of the regex matches or null if there are no matches found.

What is the regular expression function to match all occurrences of a string in Python?

findall. findall() is probably the single most powerful function in the re module. Above we used re.search() to find the first match for a pattern. findall() finds *all* the matches and returns them as a list of strings, with each string representing one match.

How do you match a full expression in regex?

To run a “whole words only” search using a regular expression, simply place the word between two word boundaries, as we did with ‹ \bcat\b ›. The first ‹ \b › requires the ‹ c › to occur at the very start of the string, or after a nonword character.

Which function return list contains all matches?

The findall() function returns a list containing all matches.


1 Answers

Are there any well known algorithms in place to do this?

In the Perl eco-system the Regexp::Genex CPAN module does this.

In Python the sre_yield generates the matching words. Regex inverter also does this.

A recursive algorithm is described here link1 link2 and several libraries that do this in Java are mentioned here.

Generation of random words/strings that match a given regex: xeger (Python)

Are there any research papers I could read to gain insight into this problem?

Yes, the following papers are available for counting the strings that would match a regex (or obtaining generating functions for them):

  1. Counting occurrences for a finite set of words: an inclusion-exclusion approach by F. Bassino, J. Clement2, J. Fayolle, and P. Nicodeme (2007) paper slides
  2. Regexpcount, a symbolic package for counting problems on regular expressions and words by Pierre Nicodeme (2003) paper link link code
like image 163
wsdookadr Avatar answered Sep 21 '22 16:09

wsdookadr