Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract first value in boolean search string

Tags:

regex

r

stringr

I would like to be able to control the hierarchy of elements I extract from a search string.

Specifically, in the string "425 million won", I would like to extract "won" first, but then "n" if "won" doesn't appear.

I want the result to be "won" for the following:

stringr::str_extract("425 million won", "won|n")

Note that specifying a space before won in my regex is inadequate because of other limitations in my data (there may not necessarily be a space between "million" and "won"). Ideally, I would like to do this using regex, as opposed to if-else clauses because of performance considerations.

like image 345
matsuo_basho Avatar asked Nov 27 '25 01:11

matsuo_basho


2 Answers

See code in use here

pattern <- "^(?:(?!won).)*\\K(?:won|n)"
s <- "425 million won"
m <- gregexpr(pattern,s,perl=TRUE)
regmatches(s,m)[[1]]

Explanation

  • ^ Assert position at the start of the line
  • (?:(?!won).)* Tempered greedy token matching any character except instances where won proceeds
  • \K Resets the starting point of the match. Any previously consumed characters are no longer included in the final match
  • (?:won|n) Match either won or n
like image 105
ctwheels Avatar answered Nov 29 '25 17:11

ctwheels


If you just want to extend on the code you already have:

 na.omit(str_extract("420 million won", c("won", "n")))[1]
like image 40
Daniel Avatar answered Nov 29 '25 17:11

Daniel



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!