Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Overlapping text substitution with Perl regular expression

Tags:

regex

perl

I have a text file that contains a bunch of sentences. The sentences contain white space (spaces, tabs, new lines) to separate out words consisting of letter and/or digits. I want to find the word "123" or "-123" and insert a dot (.) before the digits begin. So all occurrences of "123" and "-123" will be converted to ".123" and "-.123".

I was trying this with the following:

$line =~ s/(\s+-*123\s+)/getNewWord($1)/ge

Where $line contains a line read from the file and the function getNewWord word will put the dot(.) at appropriate place in the matched word.

But it's not working for cases where there are two consecutive "123" like " 123 123 ". As the first "123" is replaced by a " .123 " the space following the word has already been matched and the second "123" is not matched since the regex engine can't match the preceding space with that word.

Can anyone help me with this? Thanks!

like image 575
Golam Kawsar Avatar asked Dec 16 '22 05:12

Golam Kawsar


2 Answers

I agree with MRAB (and have +1'd his/her answer), but there's no real need for the getNewWord function. I'd change the entire statement to something like one of these:

$line =~ s/((?:^|\s)-?)(123)(?=\s|$)/$1.$2/g;

$line =~ s/(?:^|(?<=\s))(-?)(123)(?=\s|$)/$1.$2/g;

$line =~ s/(?:^|(?<=\s)|(?<=\s-))(?=123(?:\s|$))/./g;
like image 191
ruakh Avatar answered Jan 11 '23 22:01

ruakh


It might be slightly faster (no explicit capture) and it allows a file without leading/trailing whitespace:

$ echo '123 -123 -123  123' | perl -pe's/(?:^|\s+)\K(?=-?123\b)/./g'
.123 .-123 .-123  .123

To put . after -:

$ echo '123 -123 -123  123' | perl -pe's/(?:^|\s+)-*\K(?=123\b)/./g'
.123 -.123 -.123  .123
like image 37
jfs Avatar answered Jan 11 '23 22:01

jfs