Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex matching on word boundary OR non-digit

I'm trying to use a Regex pattern (in Java) to find a sequence of 3 digits and only 3 digits in a row. 4 digits doesn't match, 2 digits doesn't match.

The obvious pattern to me was:

"\b(\d{3})\b"

That matches against many source string cases, such as:

">123<"
" 123-"
"123"

But it won't match against a source string of "abc123def" because the c/1 boundary and the 3/d boundary don't count as a "word boundary" match that the \b class is expecting.

I would have expected the solution to be adding a character class that includes both non-Digit (\D) and the word boundary (\b). But that appears to be illegal syntax.

"[\b\D](\d{3})[\b\D]"

Does anybody know what I could use as an expression that would extract "123" for a source string situation like:

"abc123def"

I'd appreciate any help. And yes, I realize that in Java one must double-escape the codes like \b to \b, but that's not my issue and I didn't want to limit this to Java folks.

like image 663
Michael Oryl Avatar asked Dec 08 '22 08:12

Michael Oryl


2 Answers

You should use lookarounds for those cases:

(?<!\d)(\d{3})(?!\d)

This means match 3 digits that are NOT followed and preceded by a digit.

Working Demo

like image 60
anubhava Avatar answered Dec 12 '22 21:12

anubhava


Lookarounds can solve this problem, but I personally try to avoid them because not all regex engines fully support them. Additionally, I wouldn't say this issue is complicated enough to merit the use of lookarounds in the first place.

You could match this: (?:\b|\D)(\d{3})(?:\b|\D)

Then return: \1

Or if you're performing a replacement and need to match the entire string: (?:\b|\D)+(\d{3})(?:\b|\D)+

Then replace with: \1

As a side note, the reason \b wasn't working as part of a character class was because within brackets, [\b] actually has a completely different meaning--it refers to a backspace, not a word boundary.

Here's a Working Demo.

like image 26
CAustin Avatar answered Dec 12 '22 22:12

CAustin