I have this code for trucating strings after an underscore "_" is found, but I don't understand the operators/arguments that were passed through gsub to make this manipulation possible. In particular, why I should have to gsub "\\1" instead of "". I do note that the output of gsubbing nothing removes the entire string. I am also a bit confused by how the operators are being used, particularly parantheses and brackets:
AAA <- "ATGAS_1121"
(aa <- gsub("([^_]*).*", "\\1", AAA))
## [1] "ATGAS"
Please note, this post draws heavily from: R remove part of string
Thanks, I appreciate it.
In regex (..)
called capturing group which captures all the characters matched by the pattern present inside that group. You could refer those characters by back-referencing the group index number.
gsub("([^_]*).*", "\\1", AAA)
([^_]*)
captures all the characters at the start but not of _
zero or more times. Following .*
matches all the remaining characters. gsub
will replace all the matched characters with the chars in the replacement part. If your code is like,
gsub("([^_]*).*", "", AAA)
it would remove all the characters, since we matched all the characters but captured only those characters(not of _
symbol) which are present at the start. So by replacing the matched characters with the chars present inside the group index 1, will give you the part before _
symbol.
You could achieve the same result using \K
> gsub("[^_]*\\K.*", "", AAA, perl = TRUE)
[1] "ATGAS"
Since \K
is a PCRE feature, you must need to enable perl=TRUE
parameter. \K
keeps the text matched so far out of the overall regex match.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With