Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Given a string, generate a regex that can parse *similar* strings

Tags:

java

regex

For example, given the string "2009/11/12" I want to get the regex ("\d{2}/d{2}/d{4}"), so I'll be able to match "2001/01/02" too.

Is there something that does that? Something similar? Any idea' as to how to do it?

like image 853
Yossale Avatar asked Apr 22 '09 09:04

Yossale


People also ask

What can be matched using (*) in a regular expression?

You can repeat expressions with an asterisk or plus sign. A regular expression followed by an asterisk ( * ) matches zero or more occurrences of the regular expression. If there is any choice, the first matching string in a line is used.

What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.

How do you create a string in regex?

Just enter your regexp in the field below, press the Generate Text button, and you'll get a random text that matches your regexp. Press a button – invert a regexp. No ads, nonsense, or garbage.

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string).


2 Answers

There is text2re, a free web-based "regex by example" generator.

I don't think this is available in source code, though. I dare to say there is no automatic regex generator that gets it right without user intervention, since this would require the machine knowing what you want.


Note that text2re uses a template-based, modularized and very generalized approach to regular expression generation. The expressions it generates work, but they are much more complex than the equivalent hand-crafted expression. It is not a good tool to learn regular expressions because it does a pretty lousy job at setting examples.

For instance, the string "2009/11/12" would be recognized as a yyyymmdd pattern, which is helpful. The tool transforms it into this 125 character monster:

((?:(?:[1]{1}\d{1}\d{1}\d{1})|(?:[2]{1}\d{3}))[-:\/.](?:[0]?[1-9]|[1][012])[-:\/.](?:(?:[0-2]?\d{1})|(?:[3][01]{1})))(?![\d]) 

The hand-made equivalent would take up merely two fifths of that (50 characters):

([12]\d{3})[-:/.](0?\d|1[0-2])[-:/.]([0-2]?\d|3[01])\b 
like image 86
Tomalak Avatar answered Oct 06 '22 00:10

Tomalak


It's not possible to write a general solution for your problem. The trouble is that any generator probably wouldn't know what you want to check for, e.g. should "2312/45/67" be allowed too? What about "2009.11.12"?

What you could do is write such a generator yourself that is suited for your exact problem, but a general solution won't be possible.

like image 42
B.E. Avatar answered Oct 05 '22 22:10

B.E.