Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - Extract info after nth occurrence of a character from the right of string

I've seen many iterations of extracting w/ gsub but they mostly deal with extracting from left to right or after one occurrence. I am wanting to match from right to left, counting four occurrences of -, matching everything between the 3rd and 4th occurrence.

For example:

string                       outcome
here-are-some-words-to-try   some
a-b-c-d-e-f-g-h-i            f

Here are a few references I've tried using:

  • Find third occurrence of a special character and drop everything before that in R

  • regex - return all before the second occurrence

like image 276
alexb523 Avatar asked Nov 03 '17 16:11

alexb523


2 Answers

You could use

([^-]+)(?:-[^-]+){3}$

See a demo on regex101.com.


In R this could be
library(dplyr)
library(stringr)
df <- data.frame(string = c('here-are-some-words-to-try', 'a-b-c-d-e-f-g-h-i', ' no dash in here'), stringsAsFactors = FALSE)

df <- df %>%
  mutate(outcome = str_match(string, '([^-]+)(?:-[^-]+){3}$')[,2])
df

And yields

                      string outcome
1 here-are-some-words-to-try    some
2          a-b-c-d-e-f-g-h-i       f
3            no dash in here    <NA>
like image 188
Jan Avatar answered Oct 11 '22 04:10

Jan


x = c("here-are-some-words-to-try", "a-b-c-d-e-f-g-h-i")
sapply(x, function(strings){
    ind = unlist(gregexpr(pattern = "-", text = strings))
    if (length(ind) < 4){NA}
    else{substr(strings, ind[length(ind) - 3] + 1, ind[length(ind) - 2] - 1)}
})
#here-are-some-words-to-try          a-b-c-d-e-f-g-h-i 
#                    "some"                        "f" 
like image 2
d.b Avatar answered Oct 11 '22 03:10

d.b