Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regexp. How to match word isn't followed and preceded by another characters

Tags:

regex

I want to replace mm units to cm units in my code. In the case of the big amount of such replacements I use regexp.

I made such expression:

(?!a-zA-Z)mm(?!a-zA-Z)

But it still matches words like summa, gamma and dummy.

How to make up regexp correctly?

like image 226
Kenenbek Arzymatov Avatar asked Jan 30 '23 22:01

Kenenbek Arzymatov


2 Answers

Use character classes and change the first (?!...) lookahead into a lookbehind:

(?<![a-zA-Z])mm(?![a-zA-Z])
^^^^^^^^^^^^^   ^^^^^^^^^^^ 

See the regex demo

The pattern matches:

  • (?<![a-zA-Z]) - a negative lookbehind that fails the match if there is an ASCII letter immediately to the left of the current location
  • mm - a literal substring
  • (?![a-zA-Z]) - a negative lookahead that fails the match if there is an ASCII letter immediately to the right of the current location

NOTE: If you need to make your pattern Unicode-aware, replace [a-zA-Z] with [^\W\d_] (and use re.U flag if you are using Python 2.x).

like image 78
Wiktor Stribiżew Avatar answered Feb 05 '23 14:02

Wiktor Stribiżew


There's no need to use lookaheads and lookbehinds, so if you wish to simplify your pattern you can try something like this;

\d+\s?(mm)\b

This does assume that your millimetre symbol will always follow a number, with an optional space in-between, which I think that in this case is a reasonable assumption.

The \b checks for a word boundary to make sure the mm is not part of a word such as dummy etc.

Demo here

like image 38
Tom Wyllie Avatar answered Feb 05 '23 14:02

Tom Wyllie