Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a case where "[^xy]" is not equal to "(?!x|y)."?

I'm working on my own JavaScript library to support new metacharacters and features for regular expressions, and I'd like to find a case where [^xy] is not equivalent to (?!x). (or more specifically (?:(?!x|y).)).

Take the example text: "abc\n"

Say I want to emulate a Perl regex: /\A.{3}\Z/s

With the singleline flag, the JavaScript regex should be equivalent to: /^[\s\S]{3}\n*$(?!\s)/ (\A becomes ^, . becomes [\s\S], \Z becomes \n*$(?!\s))

Now, /^.{3}$/ would fail, but /^[\s\S]{3}\n*$(?!\s)/ would capture "abcabc" (same as the Perl regex)

Since \Z contains more than just a metacharacter, emulating [^\Z] would seem to be more difficult.

Take the example text: "abcabc\n"

The proposed JavaScript regex for the Perl regex /.{3}[^\Za]/g would be .{3}(?:(?!\n*$(?!\s)|a).)/g

Both will match "bcab"

So, finally, I pose the question again. Is there a case where [^xy] is not equivalent to (?:(?!x|y).) with such a scenario, perhaps in a more complex regular expression where a lookahead would change the scenario?

like image 590
Joey Schooley Avatar asked Jun 27 '13 20:06

Joey Schooley


People also ask

How do I check if A regular expression is valid?

The static Regex. Match method returns a single Match object. By using this static method to run a regular expression against a string (in this case a blank string), we can determine whether the regular expression is invalid by watching for a thrown exception.

Which pattern is used to match any non What character?

The expression \w will match any word character. Word characters include alphanumeric characters ( - , - and - ) and underscores (_). \W matches any non-word character.

What is the regex for special characters?

Special Regex Characters: These characters have special meaning in regex (to be discussed below): . , + , * , ? , ^ , $ , ( , ) , [ , ] , { , } , | , \ . Escape Sequences (\char): To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "."

Why * is used in regex?

- a "dot" indicates any character. * - means "0 or more instances of the preceding regex token"


2 Answers

For input string "x\na", the 2 regexps give different outputs, because . doesn't match newlines.

console.log("x\na".match(/(?:(?!x|y).)/))
["a", index: 2, input: "x↵a"]
console.log("x\na".match(/[^xy]/))
["↵", index: 1, input: "x↵a"]

If you change . to [\s\S], the output is identical in this case:

console.log("x\na".match(/(?:(?!x|y)[\s\S])/))
["↵", index: 1, input: "x↵a"]

I cannot think of any other case right now.

like image 89
Dogbert Avatar answered Oct 30 '22 10:10

Dogbert


Is there a case where [^xy] is not equal to (?!x|y).?

Only the one you have already described: The JS dot doesn't match newlines, and needs to be replaced with [\s\S].

\Z becomes \n$(?!\s)

That looks wrong. After the end of the string (\z/$) there never will be anything, regardless whether whitespace or not. Afaik, \Z is a zero-width-assertion (it doesn't consume the newline(s)) and should be equivalent to

(?=\n*$)
//   ^ not sure whether ? or *

Since \Z contains more than just a metacharacter, emulating [^\Z] would seem to be more difficult.

What do you mean by "metacharacter"? It's a zero-width-assertion, and doesn't make much sense in a character class. I'd guess it's either a syntax error, or will be interpreted literally (unescaped) as [^Z].

like image 36
Bergi Avatar answered Oct 30 '22 09:10

Bergi