Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl - replacing sequences of identical characters

Tags:

regex

perl

I am trying to implement a regexp that, given a string, it checks for a sequence of at least 3 of identical characters and replaces it with two of that character. For example, I want to turn the below string:

sstttttrrrrrrriing

into

ssttrriing 

I am thinking of something along the lines of...

$string =~ s/(\D{3,})/substr($1, 0, 2)/e;

But this will not work because:

  1. It doesn't check if the three alphabetical characters are identical; it can match a sequence of three or more distinct characters.
  2. It only replaces the first match; I need to accommodate for all matches in this regexp.

Can anyone help me?

like image 589
Dan Avatar asked Dec 07 '22 14:12

Dan


2 Answers

You can use a capture group and backreference it with \1, then insert it twice afterwards.

$ perl -plwe 's/(.)\1{2,}/$1$1/g'
sstttttrrrrrrriing
ssttrriing

Or you can use the \K (keep) escape sequence to avoid having to re-insert.

s/(.)\1\K\1+//g

Replace wildcard . for any suitable character (class) if needed. For example for letters:

perl -plwe 's/(\pL)\1\K\1+//g'
like image 119
TLP Avatar answered Dec 26 '22 18:12

TLP


$ echo "sssssttttttrrrrriiiinnnnggg" | perl -pe "s/(.)\1+/\1\1/g"
ssttrriinngg
like image 35
Sparky Avatar answered Dec 26 '22 17:12

Sparky