Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting string between words using logical operators in rm_between function

I am trying to extract strings between words. Consider this example -

x <-  "There are 2.3 million species in the world"

This may also take another form which is

x <-  "There are 2.3 billion species in the world"

I need the text between There and either 'million or billion, including them. The presence of million or billion is decided on run time, it is not decided before hand. So the output which I need from this sentence is

[1] There are 2.3 million OR
[2] There are 2.3 billion

I am using rm_between function from qdapRegex package for the same. Using this command I can extract only one of them at a time.

library(qdapRegex)
rm_between(x, 'There', 'million', extract=TRUE, include.markers = TRUE) 

OR I have to use

rm_between(x, 'There', 'billion', extract=TRUE, include.markers = TRUE)

How can I write a command which can check presence of million or billion in the same sentence. Something like this -

rm_between(x, 'There', 'billion' || 'million', extract=TRUE, include.markers = TRUE)

I hope this is clear. Any help would be appreciated.

like image 530
Ronak Shah Avatar asked Oct 30 '25 16:10

Ronak Shah


2 Answers

The left and right arguments in rm_between takes a vector of character/numeric symbols. So you can use a vector with equal length in both left/right arguments.

 library(qdapRegex)
 unlist(rm_between(x, rep('There',2), c('million', 'billion'),
                         extract=TRUE, include.markers=TRUE))
 #[1] "There are 2.3 million" "There are 2.3 billion"
 unlist(rm_between(x1, rep('There',2), c('million', 'billion'),
                         extract=TRUE, include.markers=TRUE))
 #[1] "There are 2.3 million"

 unlist(rm_between(x2, rep('There',2), c('million', 'billion'),
                         extract=TRUE, include.markers=TRUE))
 #[1] "There are 2.3 billion"

Or

  sub('\\s*species.*', '', x)

data

 x <-  c("There are 2.3 million species in the world", 
   "There are 2.3 billion species in the world")
 x1 <- "There are 2.3 million species in the world"
 x2 <- "There are 2.3 billion species in the world"
like image 122
akrun Avatar answered Nov 01 '25 05:11

akrun


You may use str_extact_all (for global matching) or str_extract (single match)

library(stringr)
str_extract_all(s, "\\bThere\\b.*?\\b(?:million|billion)\\b")

or

str_extract_all(s, perl("(?<!\\S)There(?=\\s+).*?\\s(?:million|billion)(?!\\S)"))
like image 35
Avinash Raj Avatar answered Nov 01 '25 05:11

Avinash Raj



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!