Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add missing spaces after periods using regex, without changing decimals

Tags:

python

regex

I have a large piece of text that is missing spaces after some of the periods. However the text also contains decimal numbers.

Here's what I have so far to fix the problem using regex (I'm using python):

re.sub(r"(?!\d\.\d)(?!\. )\.", '. ', my_string)

But the first escape group doesn't seem to work. It still matches periods in decimal numbers.

Here is sample text to make sure any potential solution works:

this is a.match
this should also match.1234
and this should 123.match

this should NOT match. Has space after period
this also should NOT match 1.23
like image 253
Idodo Avatar asked Dec 17 '21 13:12

Idodo


Video Answer


1 Answers

You can use

re.sub(r'\.(?!(?<=\d\.)\d) ?', '. ', text)

See the regex demo. The trailing space is matched optionally, so if it is there, it will be removed and put back.

Details

  • \. - a dot
  • (?!(?<=\d\.)\d) - do not match any further if the dot before was a dot between two digit
  • ? - an optional space.

See a Python demo:

import re
text = "this is a.match\nthis should also match.1234\nand this should 123.match\n\nthis should NOT match. Has space after period\nthis also should NOT match 1.23"
print(re.sub(r'\.(?!(?<=\d\.)\d) ?', '. ', text))

Output:

this is a. match
this should also match. 1234
and this should 123. match

this should NOT match. Has space after period
this also should NOT match 1.23

Alternatively, use a (?! ) lookahead as in your attempt:

re.sub(r'\.(?!(?<=\d\.)\d)(?! )', '. ', text)

See the regex demo and the Python demo.

like image 114
Wiktor Stribiżew Avatar answered Oct 19 '22 16:10

Wiktor Stribiżew