Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

$ in regular expression not maching against end of line

I have a problem with the following regular expression:

var s = "http://www.google.com/dir/file\r\nhello"
var re = new RegExp("http://([^/]+).*/([^/\r\n]+)$");
var arr = re.exec(s);
alert(arr[2]);

Above, I expect arr[2] (i.e. capture group 2) to be "file", matching against the last 4 character in the first line after applying a greedy .*, backtracking due to / in the pattern, and then anchoring against the end of line by $.

In fact, arr[] is null, which implies that the pattern did not even match.

I can alter this slightly so it does precisely what I intend:

var s = "http://www.google.com/dir/file\r\nhello"
var re = new RegExp("http://([^/]+).*/([^/\r\n]+)[\r\n]*");
var arr = re.exec(s);
alert(arr[2]); // "file", as expected

My question is not so how much HOW to grab "file" from the end of the first line in s. Instead, I'm trying to understand WHY the first regexp fails and the second succeeds. Why does $ not match against the \r\n line break in example 1? Isn't that the sole purpose of its existence? Is there something else I'm missing?

Also, consider the same first regular expression as used in sed (with extended regular expression mode enabled with -r):

$ echo -e "http://www.google.com/dir/file\r\nhello" |sed -r  -e 's#http://([^/]+).*/([^/\r\n]+)$#\2.OUTSIDE.OF.CAPTURE.GROUP#'
<<OUTPUT>>
file.OUTSIDE.OF.CAPTURE.GROUP
hello

Here, capture group 2 captures "file" and nothing else. "hello" appears in the output, but does not exist inside the capture group, which is proven by the position of string ".OUTSIDE.OF.CAPTURE.GROUP" in the output. So the regular expression works according to my understanding in sed, but not using the built in Javascript regexp engine.

If I replace \r\n in the input string with just \n, the behavior is identical for all three above examples, so that should not be relevant as far as I can tell.

like image 403
jrsanderson Avatar asked Oct 05 '15 22:10

jrsanderson


People also ask

How do you match the end of a line in regex?

Line Anchors In regex, anchors are not used to match characters. Rather they match a position i.e. before, after, or between characters. To match start and end of line, we use following anchors: Caret (^) matches the position before the first character in the string. Dollar ($) matches the position right after the last character in the string. 2.

How do I expand the match of a regular expression?

You can use the same method to expand the match of any regular expression to an entire line, or a block of complete lines. In some cases, such as when using alternation, you will need to group the original regex together using parentheses.

What does $* $do in regex?

Finally, .*$ causes the regex to actually match the line, after the lookaheads have determined it meets the requirements. If your condition is that a line should not contain something, use negative lookahead. ^((?!regexp).)*$ matches a complete line that does not match regexp.

How to test for negative look ahead in regexp?

For the positive lookahead, we only need to find one location where it can match. But the negative lookahead must be tested at each and every character position in the line. We must test that regexp fails everywhere, not just somewhere.


1 Answers

You need to enable regex multiline mode to match end of line characters

var re = new RegExp("http://([^/]+).*/([^/\r\n]+)$", "m");

http://javascript.info/tutorial/ahchors-and-multiline-mode

enter image description here

like image 71
Maksym Kozlenko Avatar answered Sep 29 '22 16:09

Maksym Kozlenko