Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bug in Java regex implementation?

Tags:

java

regex

I've identified some unexpected behavior in Java's regex implementation. When using java.util.regex.Pattern and java.util.regex.Matcher, the following regular expression does not match correctly on the input "Merlot" when using Matcher's find() method:

((?:White )?Zinfandel|Merlot) 

If I change the order of the expressions inside the outermost matching group, Matcher's find() method does match.

(Merlot|(?:White )?Zinfandel) 

Here is some test code that illustrates the problem.

RegexTest.java

import java.util.regex.*;  public class RegexTest {     public static void main(String[] args) {         Pattern pattern1 = Pattern.compile("((?:White )?Zinfandel|Merlot)");         Matcher matcher1 = pattern1.matcher("Merlot");         // prints "No Match :("         if (matcher1.find()) {             System.out.println(matcher1.group(0));         } else {             System.out.println("No match :(");         }          Pattern pattern2 = Pattern.compile("(Merlot|(?:White )?Zinfandel)");         Matcher matcher2 = pattern2.matcher("Merlot");         // prints "Merlot"         if (matcher2.find()) {             System.out.println(matcher2.group(0));         } else {             System.out.println("No match :(");         }     } } 

The expected output is:

Merlot Merlot 

But the actual output is:

No Match :( Merlot 

I've verified this unexpected behavior exists in Java version 1.7.0_11 on Ubuntu linux and also Java version 1.6.0_37 on OSX 10.8.2. I reported this behavior as a bug to Oracle yesterday and got back an automated email telling me my bug report has been received and has an internal review ID of 2441589. I can't find my bug report when I search for that id in their bug database. (Can you hear the crickets?)

Have I uncovered a bug in Java's presumably thoroughly tested and used regex implementation (hard to believe in 2013), or am I doing something wrong?

like image 620
Asaph Avatar asked Feb 05 '13 17:02

Asaph


People also ask

What does \\ mean in Java regex?

The backslash \ is an escape character in Java Strings. That means backslash has a predefined meaning in Java. You have to use double backslash \\ to define a single backslash. If you want to define \w , then you must be using \\w in your regex.

Is Java regex pattern thread safe?

The Regex class itself is thread safe and immutable (read-only). That is, Regex objects can be created on any thread and shared between threads; matching methods can be called from any thread and never alter any global state.

Is regex fast in Java?

Regex is faster for large string than an if (perhaps in a for loops) to check if anything matches your requirement. If you are using regex as to match very small text and small pattern and don't do it because the matcher function . find() is slower than a normal if statement of a switch statement.

What is Java util regex pattern?

A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. They can be used to search, edit, or manipulate text and data. The java.util.regex package primarily consists of the following three classes −


2 Answers

The following:

import java.util.regex.*;  public class T {   public static void main( String args[] ) {     System.out.println( Pattern.compile("(a)?bb|c").matcher("c").find() );     System.out.println( Pattern.compile("(a)?b|c").matcher("c").find() );   } } 

prints

false true 

on:

  • JDK 1.7.0_13
  • JDK 1.6.0_24

The following:

import java.util.regex.*;  public class T {   public static void main( String args[] ) {     System.out.println( Pattern.compile("((a)?bb)|c").matcher("c").find() );     System.out.println( Pattern.compile("((a)?b)|c").matcher("c").find() );   } } 

prints:

true true 
like image 113
Mikhail Vladimirov Avatar answered Oct 02 '22 17:10

Mikhail Vladimirov


It seems to be fixed in Java 1.8.

Welcome to Scala version 2.11.0-20130930-063927-2bba779702 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0-ea). Type in expressions to have them evaluated. Type :help for more information.  scala> import java.util.regex._ import java.util.regex._  scala> Pattern.compile("((?:White )?Zinfandel|Merlot)") res0: java.util.regex.Pattern = ((?:White )?Zinfandel|Merlot)  scala> .matcher("Merlot") res1: java.util.regex.Matcher = java.util.regex.Matcher[pattern=((?:White )?Zinfandel|Merlot) region=0,6 lastmatch=]  scala> .find() res2: Boolean = true 
like image 28
som-snytt Avatar answered Oct 02 '22 18:10

som-snytt