Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to match if string *only* contains *all* characters from a character set, plus an optional one

Tags:

java

string

regex

I ran into a wee problem with Java regex. (I must say in advance, I'm not very experienced in either Java or regex.)

I have a string, and a set of three characters. I want to find out if the string is built from only these characters. Additionally (just to make it even more complicated), two of the characters must be in the string, while the third one is **optional*.

I do have a solution, my question is rather if anyone can offer anything better/nicer/more elegant, because this makes me cry blood when I look at it...

The set-up

  • There mandatory characters are: | (pipe) and - (dash).

    The string in question should be built from a combination of these. They can be in any order, but both have to be in it.

  • The optional character is: : (colon).

    The string can contain colons, but it does not have to. This is the only other character allowed, apart from the above two.

  • Any other characters are forbidden.

Expected results

Following strings should work/not work:

"------" = false
"||||" = false
"---|---" = true
"|||-|||" = true
"--|-|--|---|||-" = true

...and...

"----:|--|:::|---::|" = true
":::------:::---:---" = false
"|||:|:::::|" = false
"--:::---|:|---G---n" = false

...etc.

The "ugly" solution

Now, I have a solution that seems to work, based on this stackoverflow answer. The reason I'd like a better one will become obvious when you've recovered from seeing this:

if (string.matches("^[(?\\:)?\\|\\-]*(([\\|\\-][(?:\\:)?])|([(?:\\:)?][\\|\\-]))[(?\\:)?\\|\\-]*$") || string.matches("^[(?\\|)?\\-]*(([\\-][(?:\\|)?])|([(?:\\|)?][\\-]))[(?\\|)?\\-]*$")) {

    //do funny stuff with a meaningless string

} else {

   //don't do funny stuff with a meaningless string

}

Breaking it down

The first regex

 "^[(?\\:)?\\|\\-]*(([\\|\\-][(?:\\:)?])|([(?:\\:)?][\\|\\-]))[(?\\:)?\\|\\-]*$"

checks for all three characters

The next one

"^[(?\\|)?\\-]*(([\\-][(?:\\|)?])|([(?:\\|)?][\\-]))[(?\\|)?\\-]*$"

check for the two mandatory ones only.

...Yea, I know...

But believe me I tried. Nothing else gave the desired result, but allowed through strings without the mandatory characters, etc.

The question is...

Does anyone know how to do it a simpler / more elegant way?

Bonus question: There is one thing I don't quite get in the regexes above (more than one, but this one bugs me the most):

As far as I understand(?) regular expressions, (?\\|)? should mean that the character | is either contained or not (unless I'm very much mistaken), still in the above setup it seems to enforce that character. This of course suits my purpose, but I cannot understand why it works that way.

So if anyone can explain, what I'm missing there, that'd be real great, besides, this I suspect holds the key to a simpler solution (checking for both mandatory and optional characters in one regex would be ideal.

Thank you all for reading (and suffering ) through my question, and even bigger thanks for those who reply. :)

PS

I did try stuff like ^[\\|\\-(?:\\:)?)]$, but that would not enforce all mandatory characters.

like image 781
Attila Orosz Avatar asked Mar 22 '16 17:03

Attila Orosz


People also ask

What does * do in regex?

The Match-zero-or-more Operator ( * ) This operator repeats the smallest possible preceding regular expression as many times as necessary (including zero) to match the pattern. `*' represents this operator. For example, `o*' matches any string made up of zero or more `o' s.

What does ?= * Mean in regex?

?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).

What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.

What can be matched using (*) in a regular expression?

A regular expression followed by an asterisk ( * ) matches zero or more occurrences of the regular expression. If there is any choice, the first matching string in a line is used.


1 Answers

Use a lookahead based regex.

^(?=.*\\|)(?=.*-)[-:|]+$

or

^(?=.*\\|)[-:|]*-[-:|]*$

or

^[-:|]*(?:-:*\\||\\|:*-)[-:|]*$

DEMO 1
DEMO 2

  • (?=.*\\|) expects atleast one pipe.
  • (?=.*-) expects atleast one hyphen.
  • [-:|]+ any char from the list one or more times.
  • $ End of the line.
like image 179
Avinash Raj Avatar answered Nov 06 '22 05:11

Avinash Raj