Regex in R: replace only part of a pattern

Question

s <- "YXABCDXABCDYX"

I want to use a regular expression to return ABCDABCD, i.e. 4 characters on each side of central "X" but not including the "X". Note that "X" is always in the center with 6 letters on each side.

I can find the central pattern with e.g. "[A-Z]{4}X[A-Z]{4}", but can I somehow let the return be the first and third group in "([A-Z]{4})(X)([A-Z]{4})"?

rawr · Accepted Answer

Your regex "([A-Z]{4})(X)([A-Z]{4})" won't match your string since you have characters before the first capture group ([A-Z]{4}), so you can add .* to match any character (.) 0 or more times (*) until your first capture group.

You can reference the groups in gsub, for example, using \n where n is the nth capture group

s <- "YXABCDXABCDYX"

gsub('.*([A-Z]{4})(X)([A-Z]{4}).*', '\1\3', s)
# [1] "ABCDABCD"

which is basically matching the entire string and replacing it with whatever was captured in groups 1 and 3 and pasting that together.

Another way would be to use (?i) which is case-insensitive matching along with [a-z] or \w

gsub('(?i).*(\w{4})(x)(\w{4}).*', '\1\3', s)
# [1] "ABCDABCD"

Or gsub('.*(.{4})X(.{4}).*', '\1\2', s) if you like dots

Regex in R: replace only part of a pattern

Tags:

regex

r

user3375672

1 Answers

rawr

Recent Activity

Donate For Us

Regex in R: replace only part of a pattern

Tags:

regex

r

user3375672

1 Answers

rawr

Related questions

Recent Activity

Donate For Us