It is likely an implementation detail, but for the Oracle and IBM JDKs at least is the compiled pattern cached or do we as application developers need to perform the caching of compiled patterns ourselves?
Method SummaryCompiles the given regular expression into a pattern with the given flags. Returns this pattern's match flags. Creates a matcher that will match the given input against this pattern. Compiles the given regular expression and attempts to match the given input against it.
The compile(String) method of the Pattern class in Java is used to create a pattern from the regular expression passed as parameter to method. Whenever you need to match a text against a regular expression pattern more than one time, create a Pattern instance using the Pattern.
Java Pattern objects are thread safe and immutable (its the matchers that are not thread safe). As such, there is no reason not to make them static if they are going to be used by each instance of the class (or again in another method in the class).
A pattern is a compiled representation of a regular expression. Patterns are used by matchers to perform match operations on a character string. A regular expression is a string that is used to match another string, using a specific syntax.
According to [Joshua_Bloch] Effective_Java:
Some object creations are much more expensive than others. If you’re going to need such an “expensive object” repeatedly, it may be advisable to cache it for reuse. Unfortunately, it’s not always obvious when you’re creating such an object. Suppose you want to write a method to determine whether a string is a valid Roman numeral. Here’s the easiest way to do this using a regular expression:
// Performance can be greatly improved!
static boolean isRomanNumeral(String s) {
return s.matches("^(?=.)M*(C[MD]|D?C{0,3})"
+ "(X[CL]|L?X{0,3})(I[XV]|V?I{0,3})$");
}
The problem with this implementation is that it relies on the String.matches method. While String.matches is the easiest way to check if a string matches a regular expression, it’s not suitable for repeated use in performance-critical situations. The problem is that it internally creates a Pattern instance for the regular expression and uses it only once, after which it becomes eligible for garbage collection. Creating a Pattern instance is expensive because it requires compiling the regular expression into a finite state machine. To improve the performance, explicitly compile the regular expression into a Pattern instance (which is immutable) as part of class initialization, cache it, and reuse the same instance for every invocation of the isRomanNumeral method:
// Reusing expensive object for improved performance
public class RomanNumerals {
private static final Pattern ROMAN = Pattern.compile(
"^(?=.)M*(C[MD]|D?C{0,3})"
+ "(X[CL]|L?X{0,3})(I[XV]|V?I{0,3})$");
static boolean isRomanNumeral(String s) {
return ROMAN.matcher(s).matches();
}}
The improved version of isRomanNumeral provides significant performance gains if invoked frequently. On my machine, the original version takes 1.1 μs on an 8-character input string, while the improved version takes 0.17 μs, which is 6.5 times faster
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With