Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is regex in Java anchored by default with both a ^ and $ character?

Tags:

From my understanding of regular expressions, the string "00###" has to match with "[0-9]", but not with "^[0-9]$". But it doesn't work with Java regexp's.

After some investigating of this problem I found the following information (http://www.wellho.net/solutions/java-regular-expressions-in-java.html):

It might appear that Java regular expressions are default anchored with both a ^ and $ character.

Can we be sure that this is true for all versions of JDK? And can this mode be turned off (i.e. to disable default anchoring with ^ and $)?

like image 531
DixonD Avatar asked Dec 12 '09 20:12

DixonD


People also ask

What is an anchored regex?

Anchors are regex tokens that don't match any characters but that say or assert something about the string or the matching process. Anchors inform us that the engine's current position in the string matches a determined location: for example, the beginning of the string/line, or the end of a string/line.

How do you use an anchor in regex?

Use the ^ anchor to match the beginning of the text. Use the $ anchor to match the end of the text. Use the m flag to enable the multiline mode that instructs the ^ and $ anchors to match the beginning and end of the text as well as the beginning and end of the line.

What type of regex does Java use?

The Java regex package implements a "Perl-like" regular expressions engine, but it has some extra features like possessive quantifiers ( . *+ ) and variable-length (but finite) lookbehind assertions). On the other hand, it misses a few features Perl has, namely conditional expressions or comments.

How does Java regex work?

It works as the combination of compile and matcher methods. It compiles the regular expression and matches the given input with the pattern. splits the given input string around matches of given pattern. returns the regex pattern.


2 Answers

As the article you linked to explains, it depends on the function you call. If you want to add ^ and $ by default, use String#matches or Matcher#matches. If you don't want that, use the Matcher#find method instead.

import java.util.regex.*;  public class Example {     public static void main(String[] args)     {         System.out.println("Matches: " + "abc".matches("a+"));          Matcher matcher = Pattern.compile("a+").matcher("abc");         System.out.println("Find: " + matcher.find());     } } 

Output:

Matches: false Find: true 
like image 153
Mark Byers Avatar answered Oct 12 '22 01:10

Mark Byers


Yes, matches() always acts as if the regex were anchored at both ends. To get the traditional behavior, which is to match any substring of the target, you have to use find() (as others have already pointed out). Very few regex tools offer anything equivalent to Java's matches() methods, so your confusion is justified. The only other one I can think of offhand is the XML Schema flavor.

like image 31
Alan Moore Avatar answered Oct 12 '22 02:10

Alan Moore