Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex: Lookahead and lookbehind, checking a . (dot) for use as decimal vs full-stop

I've had a good look around the net for an answer to this, but can't seem to get it working.

I have developed the following regex:

    (?<![^\d][\\])[\.](?![\d])

The objective is to identify any '.' (dots) that have not been escaped or that are part of a decimal number.

ie)

  • abc.co.uk, both dots should match
  • ab0.co.uk, both dots should match
  • abc.0.uk, both dots should match
  • abc\.co.uk, only the second dot gets matched
  • 0.00, dot should NOT match
  • abc0.0.uk, first dot would NOT match (which is an acceptable outcome), second dot should

At moment it works for all the cases above, except:

  • abc.0.uk, both dots should match

Any thoughts? It seems the look-behind is working correctly, however the look-ahead is not.

Am sure it'll be an easy one for any Regex gurus!

FYI. I'm developing this in .net 4

like image 733
killercowuk Avatar asked Nov 10 '11 11:11

killercowuk


People also ask

What is lookahead and Lookbehind in regex?

Lookahead allows to add a condition for “what follows”. Lookbehind is similar, but it looks behind. That is, it allows to match a pattern only if there's something before it.

What is a Lookbehind regex?

Lookbehind has the same effect, but works backwards. It tells the regex engine to temporarily step backwards in the string, to check if the text inside the lookbehind can be matched there. (? <!a)b matches a “b” that is not preceded by an “a”, using negative lookbehind.

What is lookahead assertion in regex?

A lookahead assertion has the form (?= test) and can appear anywhere in a regular expression. MATLAB® looks ahead of the current location in the text for the test condition. If MATLAB matches the test condition, it continues processing the rest of the expression to find a match.

What is Lookbehind assertion?

Regex Lookbehind is used as an assertion in Python regular expressions(re) to determine success or failure whether the pattern is behind i.e to the right of the parser's current position. They don't match anything. Hence, Regex Lookbehind and lookahead are termed as a zero-width assertion.


1 Answers

Try this one

(?<![\\\d])\.(?=\d)|(?<=[^\D\\])\.(?!\d)|(?<=[^\d\\])\.(?!\d)

See it here on Regexr

I broke it down in three steps.

  1. Match if before is not a escape character and not a digit and behind is a digit.

  2. Match if before is not a escape character and a digit and behind is not a digit

  3. Match if before is not a escape character and not a digit and behind is not a digit

like image 135
stema Avatar answered Nov 01 '22 01:11

stema