I am new to RegEx and I am having some difficult time when trying to detect a pattern.
I want to identify a number that is between 4000-4999 but at the same time must NOT be preceded or followed by another number with an optional character of either space or hyphen "-".
For example: 4567 (match)
I have 4999 roses (match)
1234567 days are gone (no match)
My water supply account is 123 4567 89 (no match)
Howdy, my cell number is 123-4567-89 (no match)
I tried below pattern
(?<!(\d))\b4\d{3}\b(?!(\d))
but it still gives me a match for 123 4567 - I guess there is something special about \b?
Any advice will be highly appreciated.
Thanks, Eric
You may use
(?<!\d[\s-]|\d)4\d{3}(?![\s-]?\d)
In .NET, JavaScript ECMAScript 2018 compliant environments, or PyPi regex, where lookbehinds patterns can contain ?, *, + and {min,} quantifiers, you may shorten it to
(?<!\d[\s-]?)4\d{3}(?![\s-]?\d)
Or, in case alternation with different length is not supported (as in Boost or Python), use
(?<!\d[\s-])(?<!\d)4\d{3}(?![\s-]?\d)
See the regex demo and regex demo 2 (and a .NET regex demo).
Details
(?<!\d[\s-]|\d) / (?<!\d[\s-]?) / (?<!\d[\s-])(?<!\d) - no digit and a whitespace/- and no digit immediately to the left of the current position is allowed4\d{3} - 4 and any 3 digits(?![\s-]?\d) - immediately to the right, no 1 or 0 occurrences of a whitespace/- followed with a digit is allowed.NOTE The solutions above do not rely on word boundaries and may even match in between underscores and when glued to words. If you really want to avoid that, then you need to use word boundaries by all means, e.g. (?<!\d[\s-]|\d)\b4\d{3}\b(?![\s-]?\d).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With