Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there an elegant way to do partial regex matches in Java?

Tags:

java

regex

What I need is to check whether a given string partially matches a given regex. For example, for the regex ab[0-9]c, the strings "a", "ab", "ab3", and "b3c" would "match", but not the strings "d", "abc", or "a3c". What I've been doing is the clunky a(?:b(?:[0-9](?:c)?)?)? (which only works for some of the partial matches, specifically those which "begin" to match), but since this is part of an API, I'd rather give the users a more intuitive way of entering their matching regexps.

In case the description's not very clear (and I realize it might not be!), this will be used for validating text input on text boxes. I want to prevent any editing that would result in an invalid string, but I can't just match the string against a regular regex, since until it's fully entered, it would not match. For example, using the regex above (ab[0-9]c), when I attempt to enter 'a', it's disallowed, since the string "a" does not match the regex.

Basically, it's a sort of reverse startsWith() which works on regexps. (new Pattern("ab[0-9]c").startsWith("ab3") should return true.)

Any ideas?

like image 351
Tonio Avatar asked Sep 29 '09 17:09

Tonio


3 Answers

Is Matcher.hitEnd() what you're looking for?

Pattern thePattern = Pattern.compile(theRegexString);
Matcher m = thePattern.matcher(theStringToTest);
if (m.matches()) {
    return true;
}
return m.hitEnd();
like image 186
Éric Malenfant Avatar answered Nov 14 '22 22:11

Éric Malenfant


Although there may be some trickery available, your way is probably the best semantically. It accurately describes what you're are looking for.

However, the bigger issue is whether you really need to validate every single time a character is typed into the text box. Why can't you just validate it once at the end and save yourself some headaches?

like image 34
Pesto Avatar answered Nov 14 '22 23:11

Pesto


Here is a regex that can solve your particular example:

^(?:a|b|[0-9]|c|ab|b[0-9]|[0-9]c|ab[0-9]|b[0-9]c|ab[0-9]c)?$

Generally speaking, if you can break the regex down into atomic parts, you can OR together all possible groupings of them, but it is big and ugly. In this case, there were 4 parts (a, b, [0-9], and c), so you had to OR together 4+3+2+1=10 possibilities. (For n parts, it is (n×(n+1))/2 possibilities). You might be able to generate this algorithmically, but it would be a huge pain to test. And anything complex (like a subgroup) would be very difficult to get right.

A better solution is probably just to have a message beside the input field telling the user "not enough info" or something, and when they have it right change it to a green checkbox or something. Here's a recent article from A List Apart that weighs the pros and cons of different approaches to this problem: Inline Validation in Web Forms.

like image 44
Kip Avatar answered Nov 14 '22 23:11

Kip