Extract first value in boolean search string

Question

I would like to be able to control the hierarchy of elements I extract from a search string.

Specifically, in the string "425 million won", I would like to extract "won" first, but then "n" if "won" doesn't appear.

I want the result to be "won" for the following:

stringr::str_extract("425 million won", "won|n")

Note that specifying a space before won in my regex is inadequate because of other limitations in my data (there may not necessarily be a space between "million" and "won"). Ideally, I would like to do this using regex, as opposed to if-else clauses because of performance considerations.

ctwheels · Accepted Answer

See code in use here

pattern <- "^(?:(?!won).)*\K(?:won|n)"
s <- "425 million won"
m <- gregexpr(pattern,s,perl=TRUE)
regmatches(s,m)[[1]]

Explanation

^ Assert position at the start of the line
(?:(?!won).)* Tempered greedy token matching any character except instances where won proceeds
\K Resets the starting point of the match. Any previously consumed characters are no longer included in the final match
(?:won|n) Match either won or n

Daniel · Answer

If you just want to extend on the code you already have:

 na.omit(str_extract("420 million won", c("won", "n")))[1]

Extract first value in boolean search string

Tags:

regex

r

stringr

matsuo_basho

2 Answers

Explanation

ctwheels

Daniel

Recent Activity

Donate For Us

Extract first value in boolean search string

Tags:

regex

r

stringr

matsuo_basho

2 Answers

Explanation

ctwheels

Daniel

Related questions

Recent Activity

Donate For Us