I'm working on my own JavaScript library to support new metacharacters and features for regular expressions, and I'd like to find a case where [^xy]
is not equivalent to (?!x).
(or more specifically (?:(?!x|y).)
).
Take the example text: "abc\n"
Say I want to emulate a Perl regex: /\A.{3}\Z/s
With the singleline flag, the JavaScript regex should be equivalent to: /^[\s\S]{3}\n*$(?!\s)/
(\A
becomes ^
, .
becomes [\s\S]
, \Z
becomes \n*$(?!\s)
)
Now, /^.{3}$/
would fail, but /^[\s\S]{3}\n*$(?!\s)/
would capture "abcabc" (same as the Perl regex)
Since \Z
contains more than just a metacharacter, emulating [^\Z]
would seem to be more difficult.
Take the example text: "abcabc\n"
The proposed JavaScript regex for the Perl regex /.{3}[^\Za]/g
would be .{3}(?:(?!\n*$(?!\s)|a).)/g
Both will match "bcab"
So, finally, I pose the question again. Is there a case where [^xy]
is not equivalent to (?:(?!x|y).)
with such a scenario, perhaps in a more complex regular expression where a lookahead would change the scenario?
The static Regex. Match method returns a single Match object. By using this static method to run a regular expression against a string (in this case a blank string), we can determine whether the regular expression is invalid by watching for a thrown exception.
The expression \w will match any word character. Word characters include alphanumeric characters ( - , - and - ) and underscores (_). \W matches any non-word character.
Special Regex Characters: These characters have special meaning in regex (to be discussed below): . , + , * , ? , ^ , $ , ( , ) , [ , ] , { , } , | , \ . Escape Sequences (\char): To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "."
- a "dot" indicates any character. * - means "0 or more instances of the preceding regex token"
For input string "x\na"
, the 2 regexps give different outputs, because .
doesn't match newlines.
console.log("x\na".match(/(?:(?!x|y).)/))
["a", index: 2, input: "x↵a"]
console.log("x\na".match(/[^xy]/))
["↵", index: 1, input: "x↵a"]
If you change .
to [\s\S]
, the output is identical in this case:
console.log("x\na".match(/(?:(?!x|y)[\s\S])/))
["↵", index: 1, input: "x↵a"]
I cannot think of any other case right now.
Is there a case where
[^xy]
is not equal to(?!x|y).
?
Only the one you have already described: The JS dot doesn't match newlines, and needs to be replaced with [\s\S]
.
\Z
becomes\n$(?!\s)
That looks wrong. After the end of the string (\z
/$
) there never will be anything, regardless whether whitespace or not. Afaik, \Z
is a zero-width-assertion (it doesn't consume the newline(s)) and should be equivalent to
(?=\n*$)
// ^ not sure whether ? or *
Since
\Z
contains more than just a metacharacter, emulating[^\Z]
would seem to be more difficult.
What do you mean by "metacharacter"? It's a zero-width-assertion, and doesn't make much sense in a character class. I'd guess it's either a syntax error, or will be interpreted literally (unescaped) as [^Z]
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With