I'm working with data of the following form (four examples given, each separated by a new line):
some publication, issue no. 3
another publication, issue no. 23
yet another publication
here is another publication
I need to extract the publication name and - in case it exists - the issue number. This has to be done with a regex.
So given above data, I am looking for finding the following results:
some publication 3
another publication 23
yet another publication <null>
here is another publication <null>
The following pattern works only for data that has the , issue no. xyz
part:
String underTest = "some publication, issue no. 3";
String pattern = "(.*?), issue no. (\\d+)";
Matcher matcher = Pattern.compile(pattern).matcher(underTest);
boolean found = matcher.find();
if (found) {
log.info("something found");
String group1 = matcher.group(1);
log.info("group1: {}", group1);
String group2 = matcher.group(2);
log.info("group2: {}", group2);
}
Any ideas for a regex string which will work for both cases (with and without issue number)?
Use an optional non-capturing group around the optional part:
(.*?)(?:, issue no\. (\d+))?
^^^ ^^
See the regex demo
In your code:
String pattern = "(.*?)(?:, issue no\\. (\\d+))?";
If you want your pattern to match the whole string, use it with Matcher#matches()
rather than Matcher#find()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With