Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the text between two words in R?

Tags:

string

r

I am trying to get the text between two words in a sentence.
For example the sentence is -

x <-  "This is my first sentence"

Now I want the text between This and first which is is my . I have tried various functions from R like grep, grepl, pmatch , str_split. However, I could not get exactly what I want .

This is the closest what I have reached with gsub.

gsub(".*This\\s*|first*", "", x)

The output it gives is

 [1] "is my  sentence"

In reality, what I need is only

[1] "is my"

Any help would be appreciated.

like image 937
Ronak Shah Avatar asked Jul 21 '15 05:07

Ronak Shah


3 Answers

You need .* at the end to match zero or more characters after the 'first'

 gsub('^.*This\\s*|\\s*first.*$', '', x)
 #[1] "is my"
like image 156
akrun Avatar answered Oct 19 '22 20:10

akrun


Another approach using rm_between from the qdapRegex package.

library(qdapRegex)
rm_between(x, 'This', 'first', extract=TRUE)[[1]]
# [1] "is my"
like image 14
hwnd Avatar answered Oct 19 '22 19:10

hwnd


Since this question is used as a reference, I'll add some possible solutions to build a complete overview. Both are based on a look-ahead/look-behind regex pattern.

base R

regmatches( x, gregexpr("(?<=This ).*(?= first)", x, perl = TRUE ) )

stringr

stringr::str_extract_all( x, "(?<=This ).+(?= first)" )
like image 8
Wimpel Avatar answered Oct 19 '22 20:10

Wimpel