Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java regex pattern matching (Irish car registration)

Tags:

java

regex

Sorry if this a dumb question but it's been driving me mental for the past 5 days.

I'm trying to make a regex pattern to match the Irish car registration example '12-W-1234' So far this is what I have:

import java.util.ArrayList;
import java.util.List;

public class ValidateDemo {
    public static void main(String[] args) {
        List<String> input = new ArrayList<String>();
        input.add("12-WW-1");
        input.add("12-W-223");
        input.add("02-WX-431");
        input.add("98-zd-4134");
        input.add("99-c-7465");


        for (String car : input) {
            if (car.matches("^(\\d{2}-?\\w*([KK|kk|ww|WW|c|C|ce|CE|cn|CN|cw|CW|d|D|dl|DL|g|G|ke|KE|ky|KY|l|L|ld|LD|lh|LH|lk|LK|lm|LM|ls|LS|mh|MH|mn|MN|mo|MO|oy|OY|so|SO|rn|RN|tn|TN|ts|TS|w|W|wd|WD|wh|WH|wx|WX])-?\\d{1,4})$")) {
                System.out.println("Car Template " + car);
            }
        }
    }
}

My problems are coming up when it is checking regs that would have a single letter in the that is in my pattern. Eg '12-ZD-1234'. Where ZD isn't a valid county ID but since D is valid it allows it to be displayed.

Any help would be great.

I've already done research on a few websites including this and this.

These websites helped, but I'm still having my problems.

By the by, I'am going to change the pattern to change all inputs into uppercase to reduce the size of my code. Thanks for the help

like image 542
user3007858 Avatar asked Nov 19 '13 11:11

user3007858


2 Answers

Besides the \\w* that others have pointed out, you're misusing character classes ([...]). To actually use alternation (|), take out the square brackets as well:

^(\\d{2}-?(KK|kk|ww|WW|c|C|ce|CE|cn|CN|cw|CW|d|D|dl|DL|g|G|ke|KE|ky|KY|l|L|ld|LD|lh|LH|lk|LK|lm|LM|ls|LS|mh|MH|mn|MN|mo|MO|oy|OY|so|SO|rn|RN|tn|TN|ts|TS|w|W|wd|WD|wh|WH|wx|WX)-?\\d{1,4})$

Here are some examples to show you how character classes actually work:

  1. [abc] matches a single character, either a, b, or c.
  2. [aabbcc] is equivalent to [abc] (duplicates are disregarded).
  3. [|] matches a pipe character, i.e. symbols are allowed.
  4. [KK|kk|ww|WW|c|C|ce|CE ... ] ends up being equivalent to [K|wWcCeE ... ] because, again, duplicates are disregarded.

You were correct to use the alternation operator (|) to do what you desired, but you didn't need to use character classes.

like image 56
slackwing Avatar answered Oct 15 '22 05:10

slackwing


You can improve you pattern like this:

^[0-9]{2}-?(?>c[enw]?|C[ENW]?|dl?|DL?|g|G|k[eky]|K[EKY]|l[dhkms]?|L[DHKMS]?|m[hno]|M[HNO]|oy|OY|rn|RN|so|SO|t[ns]|T[NS]|w[dhx]?|W[DHX]?)-?[0-9]{1,4}$

And if you don't care about the case of letters:

^(?i)[0-9]{2}-?(?>c[enw]?|dl?|g|k[eky]|l[dhkms]?|m[hno]oy|rn|so|t[ns]|w[dhx]?)-?[0-9]{1,4}$

Note that anchors (^ and $) are useful if your string must only contain the car registration number.

Note2: You can improve it more, if you put at the first place in the alternation the most frequent county.

like image 20
Casimir et Hippolyte Avatar answered Oct 15 '22 07:10

Casimir et Hippolyte