I would like to be able to control the hierarchy of elements I extract from a search string.
Specifically, in the string "425 million won", I would like to extract "won" first, but then "n" if "won" doesn't appear.
I want the result to be "won" for the following:
stringr::str_extract("425 million won", "won|n")
Note that specifying a space before won in my regex is inadequate because of other limitations in my data (there may not necessarily be a space between "million" and "won"). Ideally, I would like to do this using regex, as opposed to if-else clauses because of performance considerations.
See code in use here
pattern <- "^(?:(?!won).)*\\K(?:won|n)"
s <- "425 million won"
m <- gregexpr(pattern,s,perl=TRUE)
regmatches(s,m)[[1]]
^ Assert position at the start of the line(?:(?!won).)* Tempered greedy token matching any character except instances where won proceeds\K Resets the starting point of the match. Any previously consumed characters are no longer included in the final match(?:won|n) Match either won or nIf you just want to extend on the code you already have:
na.omit(str_extract("420 million won", c("won", "n")))[1]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With