I try to use stringr
package to extract part of a string, which is between two particular patterns.
For example, I have:
my.string <- "nanaqwertybaba"
left.border <- "nana"
right.border <- "baba"
and by the use of str_extract(string, pattern)
function (where pattern is defined by a POSIX regular expression) I would like to receive:
"qwerty"
Solutions from Google did not work.
In base
R you can use gsub
. The parentheses in the pattern
create numbered capturing groups. Here we select the second group in the replacement
, i.e. the group between the borders. The .
matches any character. The *
means that there is zero or more of the preceeding element
gsub(pattern = "(.*nana)(.*)(baba.*)",
replacement = "\\2",
x = "xxxnanaRisnicebabayyy")
# "Risnice"
I do not know whether and how this is possible with functions provided by stringr but you can also use base regexpr
and substring
:
pattern <- paste0("(?<=", left.border, ")[a-z]+(?=", right.border, ")")
# "(?<=nana)[a-z]+(?=baba)"
rx <- regexpr(pattern, text=my.string, perl=TRUE)
# [1] 5
# attr(,"match.length")
# [1] 6
substring(my.string, rx, rx+attr(rx, "match.length")-1)
# [1] "qwerty"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With