Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create set of all possible matches for a given regex

Tags:

I'm wondering how to find a set of all matches to a given regex with a finite number of matches.

For example:

All of these example you can assume they start with ^ and end with $

`hello?` -> (hell, hello)
`[1-9][0-9]{0,3}` -> (1,2,3 ..., 9998, 9999)
`My (cat|dog) is awesome!` -> (My cat is awesome!, My dog is awesome!)
`1{1,10}` -> (1,11, ..., 111111111, 1111111111)
`1*` -> //error
`1+` -> //error
`(1|11){2}` -> (1,11,111,1111) //notice how it doesn't repeat any of the possibilities

I'd also be interested if there was a way of retrieving count a unique a solutions to the regex or if there is a way to determine if the regex has a finite solutions.

It would be nice if the algorithm could parse any regex, but a powerful enough subset of the regex would be fine.

I'm interested in a PHP solution to this problem, but other languages would also be fine.

EDIT:

I've learned in my Formal Theory class about DFA which can be used to implement regex (and other regular languages). If I could transform the regex into a DFA the solution seems fairly straight forward to me, but that transformation seems rather tricky to me.

EDIT 2:

Thanks for all the suggestions, see my post about the public github project I'm working on to "answer" this question.