Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

java.util.regex - importance of Pattern.compile()?

Tags:

java

regex

People also ask

What is the use of compile () method in regex?

compile() method is used to compile a regular expression pattern provided as a string into a regex pattern object ( re. Pattern ). Later we can use this pattern object to search for a match inside different target strings using regex methods such as a re. match() or re.search() .

What is the use of pattern compile in Java?

The compile(String) method of the Pattern class in Java is used to create a pattern from the regular expression passed as parameter to method. Whenever you need to match a text against a regular expression pattern more than one time, create a Pattern instance using the Pattern. compile() method.

What is Java util regex pattern?

A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. They can be used to search, edit, or manipulate text and data. The java.util.regex package primarily consists of the following three classes −

Is regex important in Java?

Regular Expressions or Regex (in short) in Java is an API for defining String patterns that can be used for searching, manipulating, and editing a string in Java. Email validation and passwords are a few areas of strings where Regex is widely used to define the constraints. Regular Expressions are provided under java.


The compile() method is always called at some point; it's the only way to create a Pattern object. So the question is really, why should you call it explicitly? One reason is that you need a reference to the Matcher object so you can use its methods, like group(int) to retrieve the contents of capturing groups. The only way to get ahold of the Matcher object is through the Pattern object's matcher() method, and the only way to get ahold of the Pattern object is through the compile() method. Then there's the find() method which, unlike matches(), is not duplicated in the String or Pattern classes.

The other reason is to avoid creating the same Pattern object over and over. Every time you use one of the regex-powered methods in String (or the static matches() method in Pattern), it creates a new Pattern and a new Matcher. So this code snippet:

for (String s : myStringList) {
    if ( s.matches("\\d+") ) {
        doSomething();
    }
}

...is exactly equivalent to this:

for (String s : myStringList) {
    if ( Pattern.compile("\\d+").matcher(s).matches() ) {
        doSomething();
    }
}

Obviously, that's doing a lot of unnecessary work. In fact, it can easily take longer to compile the regex and instantiate the Pattern object, than it does to perform an actual match. So it usually makes sense to pull that step out of the loop. You can create the Matcher ahead of time as well, though they're not nearly so expensive:

Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher("");
for (String s : myStringList) {
    if ( m.reset(s).matches() ) {
        doSomething();
    }
}

If you're familiar with .NET regexes, you may be wondering if Java's compile() method is related to .NET's RegexOptions.Compiled modifier; the answer is no. Java's Pattern.compile() method is merely equivalent to .NET's Regex constructor. When you specify the Compiled option:

Regex r = new Regex(@"\d+", RegexOptions.Compiled); 

...it compiles the regex directly to CIL byte code, allowing it to perform much faster, but at a significant cost in up-front processing and memory use--think of it as steroids for regexes. Java has no equivalent; there's no difference between a Pattern that's created behind the scenes by String#matches(String) and one you create explicitly with Pattern#compile(String).

(EDIT: I originally said that all .NET Regex objects are cached, which is incorrect. Since .NET 2.0, automatic caching occurs only with static methods like Regex.Matches(), not when you call a Regex constructor directly. ref)


Compile parses the regular expression and builds an in-memory representation. The overhead to compile is significant compared to a match. If you're using a pattern repeatedly it will gain some performance to cache the compiled pattern.


When you compile the Pattern Java does some computation to make finding matches in Strings faster. (Builds an in-memory representation of the regex)

If you are going to reuse the Pattern multiple times you would see a vast performance increase over creating a new Pattern every time.

In the case of only using the Pattern once, the compiling step just seems like an extra line of code, but, in fact, it can be very helpful in the general case.


It is matter of performance and memory usage, compile and keep the complied pattern if you need to use it a lot. A typical usage of regex is to validated user input (format), and also format output data for users, in these classes, saving the complied pattern, seems quite logical as they usually called a lot.

Below is a sample validator, which is really called a lot :)

public class AmountValidator {
    //Accept 123 - 123,456 - 123,345.34
    private static final String AMOUNT_REGEX="\\d{1,3}(,\\d{3})*(\\.\\d{1,4})?|\\.\\d{1,4}";
    //Compile and save the pattern  
    private static final Pattern AMOUNT_PATTERN = Pattern.compile(AMOUNT_REGEX);


    public boolean validate(String amount){

         if (!AMOUNT_PATTERN.matcher(amount).matches()) {
            return false;
         }    
        return true;
    }    
}

As mentioned by @Alan Moore, if you have reusable regex in your code, (before a loop for example), you must compile and save pattern for reuse.