Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Pattern.compile cache?

Tags:

java

regex

It is likely an implementation detail, but for the Oracle and IBM JDKs at least is the compiled pattern cached or do we as application developers need to perform the caching of compiled patterns ourselves?

like image 621
Archimedes Trajano Avatar asked Nov 16 '12 16:11

Archimedes Trajano


People also ask

What does pattern compile does?

Method SummaryCompiles the given regular expression into a pattern with the given flags. Returns this pattern's match flags. Creates a matcher that will match the given input against this pattern. Compiles the given regular expression and attempts to match the given input against it.

How does Java pattern compile work?

The compile(String) method of the Pattern class in Java is used to create a pattern from the regular expression passed as parameter to method. Whenever you need to match a text against a regular expression pattern more than one time, create a Pattern instance using the Pattern.

Is pattern compile thread safe?

Java Pattern objects are thread safe and immutable (its the matchers that are not thread safe). As such, there is no reason not to make them static if they are going to be used by each instance of the class (or again in another method in the class).

What is pattern and matcher?

A pattern is a compiled representation of a regular expression. Patterns are used by matchers to perform match operations on a character string. A regular expression is a string that is used to match another string, using a specific syntax.


1 Answers

According to [Joshua_Bloch] Effective_Java:

Some object creations are much more expensive than others. If you’re going to need such an “expensive object” repeatedly, it may be advisable to cache it for reuse. Unfortunately, it’s not always obvious when you’re creating such an object. Suppose you want to write a method to determine whether a string is a valid Roman numeral. Here’s the easiest way to do this using a regular expression:

// Performance can be greatly improved!
static boolean isRomanNumeral(String s) {
return s.matches("^(?=.)M*(C[MD]|D?C{0,3})"
+ "(X[CL]|L?X{0,3})(I[XV]|V?I{0,3})$");
}

The problem with this implementation is that it relies on the String.matches method. While String.matches is the easiest way to check if a string matches a regular expression, it’s not suitable for repeated use in performance-critical situations. The problem is that it internally creates a Pattern instance for the regular expression and uses it only once, after which it becomes eligible for garbage collection. Creating a Pattern instance is expensive because it requires compiling the regular expression into a finite state machine. To improve the performance, explicitly compile the regular expression into a Pattern instance (which is immutable) as part of class initialization, cache it, and reuse the same instance for every invocation of the isRomanNumeral method:

// Reusing expensive object for improved performance
public class RomanNumerals {
private static final Pattern ROMAN = Pattern.compile(
"^(?=.)M*(C[MD]|D?C{0,3})"
+ "(X[CL]|L?X{0,3})(I[XV]|V?I{0,3})$");
static boolean isRomanNumeral(String s) {
return ROMAN.matcher(s).matches();
}}

The improved version of isRomanNumeral provides significant performance gains if invoked frequently. On my machine, the original version takes 1.1 μs on an 8-character input string, while the improved version takes 0.17 μs, which is 6.5 times faster

like image 51
ikarayel Avatar answered Sep 19 '22 06:09

ikarayel