I have a vector
test <- c("NNNCTCGTNNNGTCGTNN", "NNNNNCGTNNNGTCGTGN")
and I want to replace all N in the head of all elements using same length "-".
When I use function gsub
only replace with one "-".
gsub("^N+", "-", test)
# [1] "-CTCGTNNNGTCGTNN" "-CGTNNNGTCGTGN"
But I want the result looks like this
# "---CTCGTNNNGTCGTNN", "-----CGTNNNGTCGTGN"
Is there any R function that can do this? Thanks for your patience and advice.
Use str_replace_all() method of stringr package to replace multiple string values with another list of strings on a single column in R and update part of a string with another string.
We can replace all occurrences of a particular character using gsub() function.
Match a white space followed by one or more decimal digits, followed by zero or one period or comma, followed by zero or more decimal digits. This is the first capturing group. Because the replacement pattern is $1 , the call to the Regex. Replace method replaces the entire matched substring with this captured group.
The Regex. Replace(String, String, MatchEvaluator, RegexOptions) method is useful for replacing a regular expression match in if any of the following conditions is true: The replacement string cannot readily be specified by a regular expression replacement pattern.
You can write:
test <- c("NNNCTCGTNNNGTCGTNN", "NNNNNCGTNNNGTCGTGN", "XNNNNNCGTNNNGTCGTGN")
gsub("\\GN", "-", perl=TRUE, test)
which returns:
"---CTCGTNNNGTCGTNN" "-----CGTNNNGTCGTGN" "XNNNNNCGTNNNGTCGTGN"
regex | R code
\G
, which is supported by Perl (and by PCRE (PHP), Ruby, Python's PyPI regex engine and others), asserts that the current position is at the beginning of the string for the first match and at the end of the previous match thereafter.
If the string were "NNNCTCGTNNNGTCGTNN"
the first three "N"
's would each be matched (and replaced with a hyphen by gsub
), then the attempt to match "C"
would fail, terminating the match and string replacement.
One approach would be to use the stringr
functions, which support regex callbacks:
test <- c("NNNCTCGTNNNGTCGTNN", "NNNNNCGTNNNGTCGTGN")
repl <- function(x) { gsub("N", "-", x) }
str_replace_all(test, "^N+", function(m) repl(m))
[1] "---CTCGTNNNGTCGTNN" "-----CGTNNNGTCGTGN"
The strategy here is to first match ^N+
to capture one or more leading N
. Then, we pass that match to a callback function which replaces each N
with a dash.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With