Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex validating csv strings

I have a TextField in javaFX where the background color changes accordingly to if the content is valid or not.

Valid:

987654321 1
987654321 21
0101 9 1
1701 91 1 2
4101 917 1 0 43
0801 9 178 2 0
0111 9 1 084 0

Invalid:

0101 9 1 0 1 0
3124
0314 9

Basically:

  • Only digits
  • First group 4 or 9 digits
  • If first group 9 digits -> only two groups in total
  • If first group 4 digits -> three, four or five groups in total
  • Group two and three digits 1-9999
  • Group four and five digits 0-9999

Now think one of these (valid) lines as one "Ident".

The current regex is:

final String base = "(\\d+\\s+\\d+)|(\\d+\\s+\\d+\\s+\\d+(\\s+\\d+)?(\\s+\\d+)?)|(\\d+\\s+\\d+\\s+\\d+)|(\\d+\\s+\\d+\\s+\\d+\\s+\\d+)|(\\d+\\s+\\d+\\s+\\d+\\s+\\d+\\s+\\d+)";

Which works great so far, but now I want to include csv. So I can type only one ident as I have used to, or multiple idents separated by comma (,), but not more than five idents in total.

My attempt:

final String pattern = String.format("(%s,?\\s*){1,5}",base);

This enables me to input this:

  • All the valid lines above
  • 0101 9 1, 0101 9 2, 0101 9 3
  • 0101 9 1, 987654321 21, 0101 9 3, 0101 9 4

And if I input more than 5 idents it turns invalid correctly. But if I input the invalid ident 0101 9 1 1 1 1 1 1 1 1 1 it still turns valid.

Any suggestions?

EDIT: This is the matching logic:

private final Predicate<String> typingPredicate = new Predicate<String>() {
    @Override
    public boolean apply(String input) {
        return input.matches(pattern);
    }
};

textField.textProperty().addListener(new ChangeListener<String>() {
    @Override
    public void changed(ObservableValue<? extends String> observableValue, String previous, String current) {
        if (current != null) {
            if (StringUtils.isEmpty(current) || typingPredicate.apply(current.trim())) {
                textField.getStyleClass().removeAll("invalid");
            } else {
                textField.getStyleClass().add("invalid");
            }
        }
    }
});
like image 260
Olav Gulbrandsen Blaaflat Avatar asked Nov 17 '15 12:11

Olav Gulbrandsen Blaaflat


3 Answers

The comma in your regexp is optional that fact allows "0101 9 1 1 1 1 1 1 1 1 1" to be freely parsed as two or more records.

To fix this, you may demand it to be either exactly one ident or several comma-separated ones:

final String pattern = String.format("(%s\\s*,\\s*){0,4}%s",base,base);

Also I would recommend to make base itself more strict with respect to your input rules, although it doesn't seem to be directly relevant to the issue.

like image 72
AndreyS Scherbakov Avatar answered Nov 03 '22 17:11

AndreyS Scherbakov


Let's break things down:

  • Only digits:

    The regex will have to match digits and spaces and use ^$ to match only that

  • First group 4 or 9 digits:

    Straigh forward: \d{4}|\d{9}

  • If first group 9 digits -> only two groups in total

    \d{9}\s\d the 9 digits group and the second

  • If first group 4 digits -> three, four or five groups in total

    \d{4}(\s\d){2,4} the 4 digits group followed by 2 to 4 group

  • Group two and three digits 1-9999

    1-9999 -> [1-9]\d{0,3}

  • Group four and five digits 0-9999

    Easy one ... \d{1,4}

Then combining everything:

^ # match start of string
  (\d{4} # group start with 4 digits
    (\s[1-9]\d{0,3}){2} # group of 1-9999 twice
    (\s\d{1,4}){0,2} # group of 0-9999 zero to two times
  )|(\d{9} # group start with 9 digits
    \s[1-9]\d{0,3} # group of 1-9999
)$ # end of string match

Which gives:

^((\d{4}(\s[1-9]\d{0,3}){2}(\s\d{1,4}){0,2})|(\d{9}\s[1-9]\d{0,3}))$

You can try it live here

like image 39
Cyrbil Avatar answered Nov 03 '22 16:11

Cyrbil


Here´s a solution to your problem. I modified the regex a little bit. Your pattern also somehow made the last of the invalid statement to be valid, at least for me. The basic problem that you are running into is, that your regex isn´t surrounded by brackets. So you are only adding the ,?\\s to the last statement and not to the complete regex statement.

Here´s a modified solution i came up with, that seems to validate everything as it should be doing.

public static void main(String[] args) {
    String[] inputs  = {"987654321 1",
                        "987654321 21",
                        "0101 9 1",
                        "1701 91 1 2",
                        "4101 917 1 0 43",
                        "0801 9 178 2 0",
                        "0111 9 1 084 0",
                        "0101 9 1 0 1 0",
                        "3124",
                        "0314 9"};
    String regex = "(((\\d{9}(\\s\\d*)))|(\\d{4}(\\s[1-9]\\d{0,3}){2}(\\s\\d{1,4}){0,2}))";
    String csvRegex = "("+ regex + ",\\s){0,4}"+regex;
    for(String s : inputs) {
        Matcher m = Pattern.compile(csvRegex).matcher(s);
        System.out.println(m.matches());
    }

    String falseCSVString = "987654321 1, 987654321 21, 1701 91 1 2, 0111 9 1 084 0, 0101 9 1 1 1 1 1 1 1 1 1";
    Matcher m = Pattern.compile(csvRegex).matcher(falseCSVString);
    System.out.println(m.matches());

    String rightCSVString = "987654321 1, 987654321 21, 1701 91 1 2, 0111 9 1 084 0, 0101 9 1";
    m = Pattern.compile(csvRegex).matcher(rightCSVString);
    System.out.println(m.matches());
}
like image 1
SomeJavaGuy Avatar answered Nov 03 '22 18:11

SomeJavaGuy