Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex optional group capturing JAVA

Tags:

java

regex

I have a pattern where a user specifies:

1998-2010:Make:model:trim:engine

trim and engine are optional, if present I should capture them; if not, the matcher should at least validate YMM.

([0-9]+-*[0-9]+):(.*):(.*):(.*):(.*)

This matches if all three are there, but how do I make the last two and only two fields optional?

like image 795
rad07 Avatar asked Jan 21 '14 20:01

rad07


People also ask

How do I capture a group in regex?

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d", "o", and "g".

What is non capturing group in regex?

tl;dr non-capturing groups, as the name suggests are the parts of the regex that you do not want to be included in the match and ?: is a way to define a group as being non-capturing. Let's say you have an email address [email protected] . The following regex will create two groups, the id part and @example.com part.

What does it mean to capture in regex?

capturing in regexps means indicating that you're interested not only in matching (which is finding strings of characters that match your regular expression), but you're also interested in using specific parts of the matched string later on.


1 Answers

Using a regular expression and ?, the “zero or one quantifier”

You can use ? to match zero or one of something, which is what you want to do with the last bit. However, your pattern needs a bit a modification to be more like [^:]* rather than .*. Some sample code and its output follow. The regular expression I ended up with was:

([^:]*):([^:]*):([^:]*)(?::([^:]*))?(?::([^:]*))?
|-----| |-----| |-----|    |-----|      |-----|
   a       a       a          a            a

                       |-----------||-----------|
                             b            b

Each a matches a sequence of non colons (although you'd want to modify the first one to match years), and b is a non-capturing group (so it starts with ?:) and matches zero or one time (because it has the final ? quantifier). This means that the fourth and fifth fields are optional. The sample code shows that this pattern matches in the case that there are three, four, or five fields present, and does not match if there are more than five fields or fewer than three.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class QuestionMarkQuantifier {
    public static void main(String[] args) {
        final String input = "a:b:c:d:e:f:g:h";
        final Pattern p = Pattern.compile( "([^:]*):([^:]*):([^:]*)(?::([^:]*))?(?::([^:]*))?" );
        for ( int i = 1; i <= input.length(); i += 2 ) {
            final String string = input.substring( 0, i );
            final Matcher m = p.matcher( string );
            if ( m.matches() ) {
                System.out.println( "\n=== Matches for: "+string+" ===" );
                final int count = m.groupCount();
                for ( int j = 0; j <= count; j++ ) {
                    System.out.println( j + ": "+ m.group( j ));
                }
            }
            else {
                System.out.println( "\n=== No matches for: "+string+" ===" );
            }
        }
    }
}
=== No matches for: a ===

=== No matches for: a:b ===

=== Matches for: a:b:c ===
0: a:b:c
1: a
2: b
3: c
4: null
5: null

=== Matches for: a:b:c:d ===
0: a:b:c:d
1: a
2: b
3: c
4: d
5: null

=== Matches for: a:b:c:d:e ===
0: a:b:c:d:e
1: a
2: b
3: c
4: d
5: e

=== No matches for: a:b:c:d:e:f ===

=== No matches for: a:b:c:d:e:f:g ===

=== No matches for: a:b:c:d:e:f:g:h ===

While it's certainly possible to match this kind of string by using a regular expression, it does seem like it might be easier to just split the string on : and check how many values you get back. That doesn't necessarily do other kinds of checking (e.g., characters in each field), so maybe splitting isn't quite so useful in whatever non-minimal situation is motivating this.

Using String.split and a limit parameter

I noticed your comment on another post that recommended using String.split(String) (emphasis added):

Yes I know this function, but it work for me cause I have a string which is a:b:c:d:e:f:g:h.. but I just want to group the data as a:b:c:d:e if any as one and the rest of the string as another group

It's worth noting that there's a version of split that takes one more parameter, String.split(String,int). The second parameter is a limit, described as:

The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

This means that you could use split and the limit 6 to get up to five fields from your input, and you'd have the remaining input as the last string. You'd still have to check whether you had at least 3 elements, to make sure that there was enough input, but all in all, this seems like it might be a bit simpler.

import java.util.Arrays;

public class QuestionMarkQuantifier {
    public static void main(String[] args) {
        final String input = "a:b:c:d:e:f:g:h";
        for ( int i = 1; i <= input.length(); i += 2 ) {
            final String string = input.substring( 0, i );
            System.out.println( "\n== Splits for "+string+" ===" );
            System.out.println( Arrays.toString( string.split( ":", 6 )));
        }
    }
}
== Splits for a ===
[a]

== Splits for a:b ===
[a, b]

== Splits for a:b:c ===
[a, b, c]

== Splits for a:b:c:d ===
[a, b, c, d]

== Splits for a:b:c:d:e ===
[a, b, c, d, e]

== Splits for a:b:c:d:e:f ===
[a, b, c, d, e, f]

== Splits for a:b:c:d:e:f:g ===
[a, b, c, d, e, f:g]

== Splits for a:b:c:d:e:f:g:h ===
[a, b, c, d, e, f:g:h]
like image 199
Joshua Taylor Avatar answered Sep 22 '22 13:09

Joshua Taylor