Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OR condition in Regex

Tags:

regex

Let's say I have

1 ABC Street 1 A ABC Street 

With \d, it matches 1 (what I expect), with \d \w, it matches 1 A (expected). When I combine the patterns together \d|\d \w, it matches only the first one but ignores the second one.

My question is how to use "or" condition correctly in this particular case?

PS: The condition is wrapping the number only when there is no single letter after that, otherwise wrap the number and the single letter.

Example: 1 ABC Street match number 1 only, but when 1 A ABC Street wrap the 1 A

like image 898
Hoan Dang Avatar asked Apr 13 '13 09:04

Hoan Dang


People also ask

Can you use OR in regex?

Alternation is the term in regular expression that is actually a simple “OR”. In a regular expression it is denoted with a vertical line character | . For instance, we need to find programming languages: HTML, PHP, Java or JavaScript.

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string).


Video Answer


2 Answers

Try

\d \w |\d 

or add a positive lookahead if you don't want to include the trailing space in the match

\d \w(?= )|\d 

When you have two alternatives where one is an extension of the other, put the longer one first, otherwise it will have no opportunity to be matched.

like image 141
MikeM Avatar answered Sep 21 '22 03:09

MikeM


A classic "or" would be |. For example, ab|de would match either side of the expression.

However, for something like your case you might want to use the ? quantifier, which will match the previous expression exactly 0 or 1 times (1 times preferred; i.e. it's a "greedy" match). Another (probably more relyable) alternative would be using a custom character group:

\d+\s+[A-Z\s]+\s+[A-Z][A-Za-z]+ 

This pattern will match:

  • \d+: One or more numbers.
  • \s+: One or more whitespaces.
  • [A-Z\s]+: One or more uppercase characters or space characters
  • \s+: One or more whitespaces.
  • [A-Z][A-Za-z\s]+: An uppercase character followed by at least one more character (uppercase or lowercase) or whitespaces.

If you'd like a more static check, e.g. indeed only match ABC and A ABC, then you can combine a (non-matching) group and define the alternatives inside (to limit the scope):

\d (?:ABC|A ABC) Street 

Or another alternative using a quantifier:

\d (?:A )?ABC Street 
like image 28
Mario Avatar answered Sep 21 '22 03:09

Mario