Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get first character that is causing reg expression not to match

Tags:

regex

ruby

We have one quite complex regular expression which checks for string structure.

I wonder if there is an easy way to find out which character in the string that is causing reg expression not to match.

For example,

 string.match(reg_exp).get_position_which_fails

Basically, the idea is how to get "position" of state machine when it gave up.

Here is an example of regular expression:

%q^[^\p{Cc}\p{Z}]([^\p{Cc}\p{Zl}\p{Zp}]{0,253}[^\p{Cc}\p{Z}])?$
like image 576
user2196351 Avatar asked May 22 '15 16:05

user2196351


People also ask

How do you negate a character in regex?

Similarly, the negation variant of the character class is defined as "[^ ]" (with ^ within the square braces), it matches a single character which is not in the specified or set of possible characters. For example the regular expression [^abc] matches a single character except a or, b or, c.

What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.

How do you match anything up until this sequence of characters in regular expression?

If you add a * after it – /^[^abc]*/ – the regular expression will continue to add each subsequent character to the result, until it meets either an a , or b , or c . For example, with the source string "qwerty qwerty whatever abc hello" , the expression will match up to "qwerty qwerty wh" .

How do I find a specific character in a regular expression?

There is a method for matching specific characters using regular expressions, by defining them inside square brackets. For example, the pattern [abc] will only match a single a, b, or c letter and nothing else.


1 Answers

The short answer is: No.

The long answer is that a regular expression is a complicated finite state machine that may be in a state trying to match several different possible paths simultaneously. There's no way of getting a partial match out of a regular expression without constructing a regular expression that allows partial matches.

If you want to allow partial matches, either re-engineer your expression to support them, or write a parser that steps through the string using a more manual method.

You could try generating one of these automatically with Ragel if you have a particularly difficult expression to solve.

like image 75
tadman Avatar answered Oct 27 '22 10:10

tadman