I've seen many iterations of extracting w/ gsub
but they mostly deal with extracting from left to right or after one occurrence. I am wanting to match from right to left, counting four occurrences of -
, matching everything between the 3rd and 4th occurrence.
For example:
string outcome
here-are-some-words-to-try some
a-b-c-d-e-f-g-h-i f
Here are a few references I've tried using:
Find third occurrence of a special character and drop everything before that in R
regex - return all before the second occurrence
You could use
([^-]+)(?:-[^-]+){3}$
See a demo on regex101.com.
R
this could be
library(dplyr)
library(stringr)
df <- data.frame(string = c('here-are-some-words-to-try', 'a-b-c-d-e-f-g-h-i', ' no dash in here'), stringsAsFactors = FALSE)
df <- df %>%
mutate(outcome = str_match(string, '([^-]+)(?:-[^-]+){3}$')[,2])
df
And yields
string outcome
1 here-are-some-words-to-try some
2 a-b-c-d-e-f-g-h-i f
3 no dash in here <NA>
x = c("here-are-some-words-to-try", "a-b-c-d-e-f-g-h-i")
sapply(x, function(strings){
ind = unlist(gregexpr(pattern = "-", text = strings))
if (length(ind) < 4){NA}
else{substr(strings, ind[length(ind) - 3] + 1, ind[length(ind) - 2] - 1)}
})
#here-are-some-words-to-try a-b-c-d-e-f-g-h-i
# "some" "f"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With