Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combine Multiple Regexp Patterns

Tags:

regex

groovy

I've been a regex practitioner for years, mainly in perl, where you can do handy things like:

my $delim  = qr#[-\:/]#;                       # basic enough
my $field1 = qr/(\d{8})/;                      # basic enough
my $field2 = qr/(?:one|two|three)(\d{8,10})/;  # basic enough
...
my $re = qr/$field1${delim}$field2/;      # beautiful magicks
while (<>) { 
  /$re/ and print "$1\n";
}

The point is not that you can precompile them, it's that you can use one regex inside the other as a variable to build a bigger composite regex that is actually readable. The individual pieces are testable w/ simple test data and the composite can be dynamic ($delim might be passed as an argument to a sub, for example).

The question is, how does one approach this in Java, where the Pattern/Matcher approach rules the day.

Here's my stab:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

Pattern delim  = Pattern.compile("[-\:/]");
Pattern field1 = Pattern.compile("(\d{8})");
Pattern field2 = Pattern.compile("(?:one|two|three)(\d{8,10})");
Pattern re_pat = Pattern.complle(
  field1.pattern() + delim.pattern() + field2.pattern();
)
...
Matcher re = re_pat.matcher(input);

Is this reliable (any gotchas?) and otherwise the best Java equivalent? Also feel free to answer this relative to Groovy, since that is my ultimate destination for this code (but it seems Groovy more or less relies on the underlying Java regex implementations). Thanks.

like image 677
Joe Atzberger Avatar asked Apr 03 '13 03:04

Joe Atzberger


People also ask

How can I add two regex patterns?

to combine two expressions or more, put every expression in brackets, and use: *?

How do you chain in regex?

Chaining regular expressions Regular expressions can be chained together using the pipe character (|). This allows for multiple search options to be acceptable in a single regex string.

How do you do multiple regex in Python?

made this to find all with multiple #regular #expressions. regex1 = r"your regex here" regex2 = r"your regex here" regex3 = r"your regex here" regexList = [regex1, regex1, regex3] for x in regexList: if re. findall(x, your string): some_list = re. findall(x, your string) for y in some_list: found_regex_list.

How do I match a pattern in regex?

Most characters, including all letters ( a-z and A-Z ) and digits ( 0-9 ), match itself. For example, the regex x matches substring "x" ; z matches "z" ; and 9 matches "9" . Non-alphanumeric characters without special meaning in regex also matches itself. For example, = matches "=" ; @ matches "@" .


1 Answers

In your example, I don't see any reason to precompile the regexes at all. If I were doing it, I'd just define delim, field1, and field2 as Strings, and combine them.

Adding to that, Groovy does a good job of hiding the ugliness of Java's verbose regexes. An example would look something like this:

def delim  = /[-:\/]/
def field1 = /(\d{8})/
def field2 = /(?:one|two|three)(\d{8,10})/
def re_pat = /$field1${delim}$field2/

// optionally import Matcher and explicitly declare re
def re = input =~ re_pat

You shouldn't have to worry about compiling the regexes beforehand, as Pattern caches any regexes it's already compiled (if I remember correctly). If you wanted to precompile the pattern, use this:

def re_pat = ~/$field1${delim}$field2/

One thing to note here: the / / delimiters in Groovy are really just Strings (or GStrings if they contain variable references). They aren't really regular expressions, but they have the convenience of not needing to double escape everything.

If you want to avoid escaping even /, then you can use dollar-slashy-strings in Groovy 1.8 and newer:

def delim = $/[-:/]/$

I don't think that's necessary in your example, though.

like image 196
OverZealous Avatar answered Nov 15 '22 04:11

OverZealous