Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does ^\s*$ not match "" with MULTILINE? [duplicate]

Tags:

java

regex

I'm supporting this Java application where the devs implemented some filtering based on RegEx. To be as generic as possible, they compile the patterns with the MULTILINE flag.

The other day I noticed something unexpected. In Java, the pattern "^\\s*$" does not match "" with the MULTILINE flag. It does match without that flag.

Pattern pattern = Pattern.compile("^\\s*$", Pattern.MULTILINE);
Matcher matcher = pattern.matcher("");

System.out.println("Multiline: "+matcher.find());

pattern = Pattern.compile("^\\s*$");
matcher = pattern.matcher("");

System.out.println("No-multiline: "+matcher.find());

This produces the following output

Multiline: false
Non-Multiline: true

Same results can be seen for matches():

System.out.println("Multiline: " + ("".matches("(?m)^\\s*$")));
System.out.println("No-multiline: " + ("".matches("^\\s*$")));

I would expect all cases to match.
In Python, this is the case. This:

import re

print(re.search(r'^\s*$', "", re.MULTILINE))
print(re.search(r'^\s*$', ""))

gives:

<_sre.SRE_Match object; span=(0, 0), match=''>
<_sre.SRE_Match object; span=(0, 0), match=''>

In Perl, both cases match as well and I think I remember it being the same for PHP.

I'd really appreciate if someone could explain the reasoning behind the way Java handles this case.

like image 336
shmee Avatar asked Oct 18 '22 17:10

shmee


1 Answers

You pass an empty string to the matcher. With Pattern.MULTILINE, the ^ is expected to match at the beginning of the string, but in Java it can be a bit different:

If MULTILINE mode is activated then ^ matches at the beginning of input and after any line terminator except at the end of input.

Since the string is empty, the beginning of input is its end.

Note: If you pass the flag by default, but in fact, you want patterns to match at the start of a string, you can use \A instead of ^ and \z for the end of string instead of $ that will match the string start/end even with Pattern.MULTILINE (and even an empty string will pass the \\A\\s*\\z test).

like image 121
Wiktor Stribiżew Avatar answered Oct 21 '22 08:10

Wiktor Stribiżew