Overlapping text substitution with Perl regular expression

Question

I have a text file that contains a bunch of sentences. The sentences contain white space (spaces, tabs, new lines) to separate out words consisting of letter and/or digits. I want to find the word "123" or "-123" and insert a dot (.) before the digits begin. So all occurrences of "123" and "-123" will be converted to ".123" and "-.123".

I was trying this with the following:

$line =~ s/(\s+-*123\s+)/getNewWord($1)/ge

Where $line contains a line read from the file and the function getNewWord word will put the dot(.) at appropriate place in the matched word.

But it's not working for cases where there are two consecutive "123" like " 123 123 ". As the first "123" is replaced by a " .123 " the space following the word has already been matched and the second "123" is not matched since the regex engine can't match the preceding space with that word.

Can anyone help me with this? Thanks!

ruakh · Accepted Answer

I agree with MRAB (and have +1'd his/her answer), but there's no real need for the getNewWord function. I'd change the entire statement to something like one of these:

$line =~ s/((?:^|\s)-?)(123)(?=\s|$)/$1.$2/g;

$line =~ s/(?:^|(?<=\s))(-?)(123)(?=\s|$)/$1.$2/g;

$line =~ s/(?:^|(?<=\s)|(?<=\s-))(?=123(?:\s|$))/./g;

jfs · Answer

It might be slightly faster (no explicit capture) and it allows a file without leading/trailing whitespace:

$ echo '123 -123 -123  123' | perl -pe's/(?:^|\s+)\K(?=-?123\b)/./g'
.123 .-123 .-123  .123

To put . after -:

$ echo '123 -123 -123  123' | perl -pe's/(?:^|\s+)-*\K(?=123\b)/./g'
.123 -.123 -.123  .123

Overlapping text substitution with Perl regular expression

Tags:

regex

perl

Golam Kawsar

2 Answers

ruakh

jfs

Recent Activity

Donate For Us

Overlapping text substitution with Perl regular expression

Tags:

regex

perl

Golam Kawsar

2 Answers

ruakh

jfs

Related questions

Recent Activity

Donate For Us