Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R regexp swapping text parts

Tags:

regex

r

I want to swap text parts from left to right and vice versa. This is working perfectly from left to right:

sub("(^x+)(.+)", "\\2\\1","xxxxx6.0")
[1] "6.0xxxxx"

while the other direction does not:

sub("(.+)(x+$)", "\\2\\1","6.0xxxxx")
[1] "x6.0xxxx"

What am I missing?

like image 711
Andri Signorell Avatar asked Mar 15 '23 12:03

Andri Signorell


2 Answers

The issue lies in your second regex. The second regex has a .+ which is a greedy quantifier matching every single character. The first group would try to match as much as possible.

(6.0xxxx)(x)

The parentheses indicate the two groups that would get matched by your regex.

There are two ways to solve this. The first is to use a lazy quantifier instead of a greedy quantifier:

/(.+?)(x+$)/

The question mark makes the + lazy, only taking the fewest number of digits possible instead of the most. This would group like

(6.0)(xxxxx)

which is what you want.

The other option is to match, instead of every possible character, all characters that are not x.

/(^[^x]+)(+x$)/

The caret inside the matching group indicates that the matching group is inverted (match everything that is not x). This will match everything up to the first x as group 1, which will produce the desired groups.

like image 152
Strikeskids Avatar answered Mar 19 '23 14:03

Strikeskids


You can use a negative lookbehind (?<!...) for the second regex to get around the fact that (.+) is greedy. The negative lookbehind makes it so that (x+$) won't match a character that is immediately preceded by x:

sub("(.+)(?<!x)(x+$)", "\\2\\1", "6.0xxxxx", perl=TRUE)
#[1] "xxxxx6.0"
like image 38
Jota Avatar answered Mar 19 '23 14:03

Jota