I have some complex regular expressions which I need to comment for readability and maintenance. The Java spec is rather terse and I struggled for a long time getting this working. I finally caught my bug and will post it as an answer but I'd be grateful for any other advice on maintaining regexes
As an example I want to comment the subcomponents (of patternS) in a simple name parser:
String testTarget = "Waldorf T. Flywheel";
String patternS = "([A-Za-z]+)\\s+([A-Z]\\.)?\\s+([A-Za-z]+)";
Pattern pattern = Pattern.compile(patternS, Pattern.COMMENTS);
Assert.assertTrue(pattern.matcher(testTarget).matches());
EDIT: I would be grateful for examples of the (?x) format as well.
EDIT: @geowa4 has a good suggestion which avoids embedded comments. Sinnce java and others have provided for embedded comments what are the cases where they are useful? (I think I have a case but I'd be interested to see others).
EDIT: As noted below @mikej the regex does not support the optional initial well and would be better as:
String patternS = "([A-Za-z]+)\\s+([A-Z]\\.\\s+)?([A-Za-z]+)";
but that would end up extracting space in the initial
See the post by Martin Fowler on ComposedRegex for some more ideas on improving regexp readability. In summary, he advocates breaking down a complex regexp into smaller parts which can be given meaningful variable names. e.g.
String mandatoryName = "([A-Za-z]+)";
String mandatoryWhiteSpace = "\\s+";
String optionalInitial = "([A-Z]\\.)?";
String pattern = mandatoryName + mandatoryWhiteSpace + optionalInitial +
mandatoryWhiteSpace + mandatoryName;
Why don't you just do this:
String pattern2S =
"([A-Za-z]+)" + // mandatory firstName
"\\s+" + // mandatory whitespace
...;
CONTINUATION:
If you want to keep the comments with the pattern and you need to read it in from a properties file, use this:
pattern=\
#comment1\\n\
(A-z)\
#comment2\\n\
(0-9)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With