Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distance between regular expression

Can we compute a sort of distance between regular expressions ?

The idea is to mesure in which way two regular expression are similar.

like image 274
Nicolas NOEL Avatar asked Jan 25 '10 09:01

Nicolas NOEL


3 Answers

You can build deterministic finite-state machines for both regular expressions and compare the transitions. The difference of both transitions can then be used to measure the distance of these regular expressions.

like image 155
Gumbo Avatar answered Oct 19 '22 08:10

Gumbo


There are a few of metrics you could use:

  1. The length of a valid match. Some regexs have a fixed size, some an upper limit and some a lower limit. Compare how similar their lengths or possible lengths are.

  2. The characters that match. Any regex will have a set of characters a match can contain (maybe all characters). Compare the set of included characters.

  3. Use a large document and see how many matches each regex makes and how many of those are identical.

Are you looking for strict equivalence?

like image 34
David Kanarek Avatar answered Oct 19 '22 08:10

David Kanarek


I suppose you could compute a Levenshtein Distance between the actual Regular Experssion strings. That's certainly one way of measuring a "distance" between two different Regular Expression strings.

Of course, I think it's possible that regular expressions are not required here at all, and computing the Levenshtein Distance of the actual "value" strings that the Regular Expressions would otherwise be applied to, may yield a better result.

like image 2
CraigTP Avatar answered Oct 19 '22 09:10

CraigTP