Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

capture group with optional substring

Tags:

java

regex

I'm working with data of the following form (four examples given, each separated by a new line):

some publication, issue no. 3
another publication, issue no. 23
yet another publication
here is another publication

I need to extract the publication name and - in case it exists - the issue number. This has to be done with a regex.

So given above data, I am looking for finding the following results:

some publication            3
another publication         23
yet another publication     <null>
here is another publication <null>

The following pattern works only for data that has the , issue no. xyz part:

    String underTest = "some publication, issue no. 3";

    String pattern = "(.*?), issue no. (\\d+)";
    Matcher matcher = Pattern.compile(pattern).matcher(underTest);

    boolean found = matcher.find();
    if (found) {
        log.info("something found");
        String group1 = matcher.group(1);
        log.info("group1: {}", group1);

        String group2 = matcher.group(2);
        log.info("group2: {}", group2);
    }

Any ideas for a regex string which will work for both cases (with and without issue number)?

like image 508
Abdull Avatar asked Mar 07 '17 13:03

Abdull


1 Answers

Use an optional non-capturing group around the optional part:

(.*?)(?:, issue no\. (\d+))?
     ^^^                  ^^ 

See the regex demo

In your code:

String pattern = "(.*?)(?:, issue no\\. (\\d+))?";

If you want your pattern to match the whole string, use it with Matcher#matches() rather than Matcher#find().

like image 84
Wiktor Stribiżew Avatar answered Nov 05 '22 06:11

Wiktor Stribiżew