Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it a solvable problem to generate a regular expression that matches some input set?

I provide some input set which contains known separated number of text blocks.

I want to make a program that automatically generate 1 or more regular expressions each of which matches every text block in the input set.

I see some relatively easy ways to implement a brute-force search. But I'm not an expert in compilers theory. That's why I'm curious:

1) is this problem solvable? or there are some principle impossibility to make such algorithm?

2) is it possible to achieve polynomial complexity for this algorithm and avoid brute forcing?

like image 854
Roman Avatar asked Dec 21 '10 09:12

Roman


People also ask

How do you match a regular expression?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

What are the regular expressions and how they help in validating the inputs?

Regular Expressions are specially formatted strings for specifying patterns in text. They can be useful during information validation, to ensure that data is in a specified format. For example, a ZIP code must consist of five digits, and the last name must start with a capital letter.

Why are regular expressions so complicated?

Regular expressions are dense. This makes them hard to read, but not in proportion to the information they carry. Certainly 100 characters of regular expression syntax is harder to read than 100 consecutive characters of ordinary prose or 100 characters of C code.

Is there a regular expression to detect a valid regular expression?

No, if you are strictly speaking about regular expressions and not including some regular expression implementations that are actually context free grammars. There is one limitation of regular expressions which makes it impossible to write a regex that matches all and only regexes.


1 Answers

".*" is one-or-more regexp that will match every text block in the input set. ;-)

like image 64
Tomas Brambora Avatar answered Oct 01 '22 04:10

Tomas Brambora