Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there an R function to replace a matched RegEx with a string of characters with the same length? [duplicate]

Tags:

regex

replace

r

I have a vector

test <- c("NNNCTCGTNNNGTCGTNN", "NNNNNCGTNNNGTCGTGN")

and I want to replace all N in the head of all elements using same length "-". When I use function gsub only replace with one "-".

gsub("^N+", "-", test)
# [1] "-CTCGTNNNGTCGTNN" "-CGTNNNGTCGTGN"  

But I want the result looks like this

# "---CTCGTNNNGTCGTNN", "-----CGTNNNGTCGTGN"

Is there any R function that can do this? Thanks for your patience and advice.

like image 529
Chao Tang Avatar asked Jul 18 '20 03:07

Chao Tang


People also ask

How do I replace a string with another string in R?

Use str_replace_all() method of stringr package to replace multiple string values with another list of strings on a single column in R and update part of a string with another string.

Which R function would you use to replace all instances of a character string within a character vector?

We can replace all occurrences of a particular character using gsub() function.

How do you substitute in regex?

Match a white space followed by one or more decimal digits, followed by zero or one period or comma, followed by zero or more decimal digits. This is the first capturing group. Because the replacement pattern is $1 , the call to the Regex. Replace method replaces the entire matched substring with this captured group.

Does string replace take regex?

The Regex. Replace(String, String, MatchEvaluator, RegexOptions) method is useful for replacing a regular expression match in if any of the following conditions is true: The replacement string cannot readily be specified by a regular expression replacement pattern.


Video Answer


2 Answers

You can write:

test <- c("NNNCTCGTNNNGTCGTNN", "NNNNNCGTNNNGTCGTGN", "XNNNNNCGTNNNGTCGTGN")

gsub("\\GN", "-", perl=TRUE, test)

which returns:

"---CTCGTNNNGTCGTNN"  "-----CGTNNNGTCGTGN"  "XNNNNNCGTNNNGTCGTGN"

regex | R code

\G, which is supported by Perl (and by PCRE (PHP), Ruby, Python's PyPI regex engine and others), asserts that the current position is at the beginning of the string for the first match and at the end of the previous match thereafter.

If the string were "NNNCTCGTNNNGTCGTNN" the first three "N"'s would each be matched (and replaced with a hyphen by gsub), then the attempt to match "C" would fail, terminating the match and string replacement.

like image 88
Cary Swoveland Avatar answered Oct 16 '22 11:10

Cary Swoveland


One approach would be to use the stringr functions, which support regex callbacks:

test <- c("NNNCTCGTNNNGTCGTNN", "NNNNNCGTNNNGTCGTGN")
repl <- function(x) { gsub("N", "-", x) }
str_replace_all(test, "^N+", function(m) repl(m))

[1] "---CTCGTNNNGTCGTNN" "-----CGTNNNGTCGTGN"

The strategy here is to first match ^N+ to capture one or more leading N. Then, we pass that match to a callback function which replaces each N with a dash.

like image 1
Tim Biegeleisen Avatar answered Oct 16 '22 10:10

Tim Biegeleisen