I've been a regex practitioner for years, mainly in perl, where you can do handy things like:
my $delim = qr#[-\:/]#; # basic enough
my $field1 = qr/(\d{8})/; # basic enough
my $field2 = qr/(?:one|two|three)(\d{8,10})/; # basic enough
...
my $re = qr/$field1${delim}$field2/; # beautiful magicks
while (<>) {
/$re/ and print "$1\n";
}
The point is not that you can precompile them, it's that you can use one regex inside the other as a variable to build a bigger composite regex that is actually readable. The individual pieces are testable w/ simple test data and the composite can be dynamic ($delim might be passed as an argument to a sub, for example).
The question is, how does one approach this in Java, where the Pattern/Matcher approach rules the day.
Here's my stab:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
Pattern delim = Pattern.compile("[-\:/]");
Pattern field1 = Pattern.compile("(\d{8})");
Pattern field2 = Pattern.compile("(?:one|two|three)(\d{8,10})");
Pattern re_pat = Pattern.complle(
field1.pattern() + delim.pattern() + field2.pattern();
)
...
Matcher re = re_pat.matcher(input);
Is this reliable (any gotchas?) and otherwise the best Java equivalent? Also feel free to answer this relative to Groovy, since that is my ultimate destination for this code (but it seems Groovy more or less relies on the underlying Java regex implementations). Thanks.
to combine two expressions or more, put every expression in brackets, and use: *?
Chaining regular expressions Regular expressions can be chained together using the pipe character (|). This allows for multiple search options to be acceptable in a single regex string.
made this to find all with multiple #regular #expressions. regex1 = r"your regex here" regex2 = r"your regex here" regex3 = r"your regex here" regexList = [regex1, regex1, regex3] for x in regexList: if re. findall(x, your string): some_list = re. findall(x, your string) for y in some_list: found_regex_list.
Most characters, including all letters ( a-z and A-Z ) and digits ( 0-9 ), match itself. For example, the regex x matches substring "x" ; z matches "z" ; and 9 matches "9" . Non-alphanumeric characters without special meaning in regex also matches itself. For example, = matches "=" ; @ matches "@" .
In your example, I don't see any reason to precompile the regexes at all. If I were doing it, I'd just define delim
, field1
, and field2
as String
s, and combine them.
Adding to that, Groovy does a good job of hiding the ugliness of Java's verbose regexes. An example would look something like this:
def delim = /[-:\/]/
def field1 = /(\d{8})/
def field2 = /(?:one|two|three)(\d{8,10})/
def re_pat = /$field1${delim}$field2/
// optionally import Matcher and explicitly declare re
def re = input =~ re_pat
You shouldn't have to worry about compiling the regexes beforehand, as Pattern
caches any regexes it's already compiled (if I remember correctly). If you wanted to precompile the pattern, use this:
def re_pat = ~/$field1${delim}$field2/
One thing to note here: the / /
delimiters in Groovy are really just String
s (or GStrings
if they contain variable references). They aren't really regular expressions, but they have the convenience of not needing to double escape everything.
If you want to avoid escaping even /
, then you can use dollar-slashy-strings in Groovy 1.8 and newer:
def delim = $/[-:/]/$
I don't think that's necessary in your example, though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With