Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use replace the first occurrences of a string only if it appears more than once in R?

Tags:

r

gsub

I have a strings that look like this:

problem <- c("GROUP 1", "GROUP 1 & GROUP 2", "GROUP 1 & GROUP 2 & GROUP 3", "GROUP 1 & GROUP 2 & GROUP 3 & GROUP 4")

In between each group, there's " & ". I want to use R (either sub() or something from the stringr package) to replace every " &" with a "," when there's more than one "&" present. However, I don't want the final "&" to be changed. How would I do that so it looks like:

#Note: Only the 3rd and 4th strings should be changed
solution <- c("GROUP 1", "GROUP 1 & GROUP 2", "GROUP 1, GROUP 2 & GROUP 3", "GROUP 1, GROUP 2, GROUP 3 & GROUP 4")

In the actual string, there could be an infinite number of "&"s, so I don't want to hard code a limit if possible.

like image 670
J.Sabree Avatar asked Oct 08 '21 18:10

J.Sabree


2 Answers

We could use regular expressions with a lookahead assertion Regex lookahead, lookbehind and atomic groups.

library(stringr)
str_replace_all(problem, " &(?=.*?&)", ", ")

output:

[1] "GROUP 1"                              
[2] "GROUP 1 & GROUP 2"                    
[3] "GROUP 1,  GROUP 2 & GROUP 3"          
[4] "GROUP 1,  GROUP 2,  GROUP 3 & GROUP 4"
like image 162
TarJae Avatar answered Nov 14 '22 21:11

TarJae


Using strsplit

 sapply(strsplit(problem, "\\s+&\\s+"), 
    function(x) sub(",([^,]+$)", " & \\1", toString(x)))

-output

[1] "GROUP 1"                              "GROUP 1 &  GROUP 2"                   "GROUP 1, GROUP 2 &  GROUP 3"          "GROUP 1, GROUP 2, GROUP 3 &  GROUP 4"
like image 45
akrun Avatar answered Nov 14 '22 23:11

akrun