s <- "YXABCDXABCDYX"
I want to use a regular expression to return ABCDABCD
, i.e. 4 characters on each side of central "X"
but not including the "X"
.
Note that "X"
is always in the center with 6 letters on each side.
I can find the central pattern with e.g. "[A-Z]{4}X[A-Z]{4}"
, but can I somehow let the return be the first and third group in "([A-Z]{4})(X)([A-Z]{4})"
?
Your regex "([A-Z]{4})(X)([A-Z]{4})"
won't match your string since you have characters before the first capture group ([A-Z]{4})
, so you can add .*
to match any character (.
) 0 or more times (*
) until your first capture group.
You can reference the groups in gsub
, for example, using \\n
where n is the nth capture group
s <- "YXABCDXABCDYX"
gsub('.*([A-Z]{4})(X)([A-Z]{4}).*', '\\1\\3', s)
# [1] "ABCDABCD"
which is basically matching the entire string and replacing it with whatever was captured in groups 1 and 3 and pasting that together.
Another way would be to use (?i)
which is case-insensitive matching along with [a-z]
or \\w
gsub('(?i).*(\\w{4})(x)(\\w{4}).*', '\\1\\3', s)
# [1] "ABCDABCD"
Or gsub('.*(.{4})X(.{4}).*', '\\1\\2', s)
if you like dots
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With