Combine Multiple Regexp Patterns

Tags:

groovy

I've been a regex practitioner for years, mainly in perl, where you can do handy things like:

my $delim  = qr#[-\:/]#;                       # basic enough
my $field1 = qr/(\d{8})/;                      # basic enough
my $field2 = qr/(?:one|two|three)(\d{8,10})/;  # basic enough
...
my $re = qr/$field1${delim}$field2/;      # beautiful magicks
while (<>) { 
  /$re/ and print "$1\n";
}

The point is not that you can precompile them, it's that you can use one regex inside the other as a variable to build a bigger composite regex that is actually readable. The individual pieces are testable w/ simple test data and the composite can be dynamic ($delim might be passed as an argument to a sub, for example).

The question is, how does one approach this in Java, where the Pattern/Matcher approach rules the day.

Here's my stab:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

Pattern delim  = Pattern.compile("[-\:/]");
Pattern field1 = Pattern.compile("(\d{8})");
Pattern field2 = Pattern.compile("(?:one|two|three)(\d{8,10})");
Pattern re_pat = Pattern.complle(
  field1.pattern() + delim.pattern() + field2.pattern();
)
...
Matcher re = re_pat.matcher(input);

Is this reliable (any gotchas?) and otherwise the best Java equivalent? Also feel free to answer this relative to Groovy, since that is my ultimate destination for this code (but it seems Groovy more or less relies on the underlying Java regex implementations). Thanks.

677

asked Apr 03 '13 03:04

Joe Atzberger

1 Answers

In your example, I don't see any reason to precompile the regexes at all. If I were doing it, I'd just define delim, field1, and field2 as Strings, and combine them.

Adding to that, Groovy does a good job of hiding the ugliness of Java's verbose regexes. An example would look something like this:

def delim  = /[-:\/]/
def field1 = /(\d{8})/
def field2 = /(?:one|two|three)(\d{8,10})/
def re_pat = /$field1${delim}$field2/

// optionally import Matcher and explicitly declare re
def re = input =~ re_pat

You shouldn't have to worry about compiling the regexes beforehand, as Pattern caches any regexes it's already compiled (if I remember correctly). If you wanted to precompile the pattern, use this:

def re_pat = ~/$field1${delim}$field2/

One thing to note here: the / / delimiters in Groovy are really just Strings (or GStrings if they contain variable references). They aren't really regular expressions, but they have the convenience of not needing to double escape everything.

If you want to avoid escaping even /, then you can use dollar-slashy-strings in Groovy 1.8 and newer:

def delim = $/[-:/]/$

I don't think that's necessary in your example, though.

196

answered Nov 15 '22 04:11

OverZealous

Related questions
                            
                                jQuery: How to highlight text within an input box?
                            
                                sed: cannot solve this regular expression
                            
                                How to make Regex in Objective-C [closed]
                            
                                with regex, is using both "is" and "is not" range definitons within the same range possible?
                            
                                Regex in java to find pattern like ${...} from given string
                            
                                Regex pattern to match positive and negative number values in a String
                            
                                Are there JavaScript equivalents of the Vim regular expression start and end of word atoms "\<" and "\>"?
                            
                                regular expression for c# verbatim like strings (processing ""-like escapes)
                            
                                Find and Replace All But Text Between Double Quotes in VS2010
                            
                                Use of findall and parenthesis in Python
                            
                                Regex: Split string on number/string?
                            
                                Regex to validate that a string contains only 0 - 9, +, #, *, [ and ]
                            
                                Bash - correct way to escape dollar in regex
                            
                                What are the differences between lazy, greedy and possessive quantifiers?
                            
                                Split using RegEx in JavaScript
                            
                                regex match on R gregexpr
                            
                                Why OrientDB doesn't use indexes for searching with "LIKE" operator?
                            
                                Using perl as a better grep to match multiple lines using single line mode m/RE/s
                            
                                Regular expression for conditionally formatting a number string
                            
                                C# Regex Pattern Conundrum

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With