Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace pattern of repeating characters/words only at the beginning of the string?

Tags:

Note that this question is in the context of Julia, and therefore (to my knowledge) PCRE.

Suppose that you had a string like this:

"sssppaaasspaapppssss"

and you wanted to match, individually, the repeating characters at the end of the string (in the case of our string, the four "s" characters - that is, so that matchall gives ["s","s","s","s"], not ["ssss"]). This is easy:

r"(.)(?=\1*$)"

It's practically trivial (and easily used - replace(r"(.)(?=\1*$)","hell","k") will give "hekk" while replace(r"(.)(?=\1*$)","hello","k") will give "hellk"). And it can be generalised for repeating patterns by switching out the dot for something more complex:

r"(\S+)(?=( \1)*$)"

which will, for instance, independently match the last three instances of "abc" in "abc abc defg abc h abc abc abc".

Which then leads to the question... how would you match the repeating character or pattern at the start of the string, instead? Specifically, using regex in the way it's used above.

The obvious approach would be to reverse the direction of the above regex as r"(?<=^\1*)(.)" - but PCRE/Julia doesn't allow lookbehinds to have variable length (except where it's fixed-variable, like (?<=ab|cde)), and thus throws an error. The next thought is to use "\K" as something along the lines of r"^\1*\K(.)", but this only manages to match the first character (presumably because it "advances" after matching it, and no longer matches the caret).

For clarity: I'm seeking a regex that will, for instance, result in

replace("abc abc defg abc h abc abc abc",<regex here>,"hello")

producing

"hello hello defg abc h abc abc abc"

As you can see, it's replacing each "abc" from the start with "hello", but only until the first non-match. The reverse one I provide above does this at the other end of the string:

replace("abc abc defg abc h abc abc abc",r"(\S+)(?=( \1)*$)","hello")

produces

"abc abc defg abc h hello hello hello"