Is there a more efficient method? How can I do this without stringr
?
txt <- "I want to extract the words between this and that, this goes with that, this is a long way from that"
library(stringr)
w_start <- "this"
w_end <- "that"
pattern <- paste0(w_start, "(.*?)", w_end)
wordsbetween <- unlist(str_extract_all(txt, pattern))
gsub("^\\s+|\\s+$", "", str_sub(wordsbetween, nchar(w_start)+1, -nchar(w_end)-1))
[1] "and" "goes with" "is a long way from"
While dealing with text data, we sometimes need to extract values between two words. These words can be close to each other, at the end sides or on random sides. If we want to extract the strings between two words then str_extract_all function of stringr package can be used.
To extract words from a string vector, we can use word function of stringr package. For example, if we have a vector called x that contains 100 words then first 20 words can be extracted by using the command word(x,start=1,end=20,sep=fixed(" ")).
The substring function in R can be used either to extract parts of character strings, or to change the values of parts of character strings. substring of a vector or column in R can be extracted using substr() function. To extract the substring of the column in R we use functions like substr() and substring().
This is an approach I use in qdap:
Using qdap:
library(qdap)
genXtract(txt, "this", "that")
## > genXtract(txt, "this", "that")
## this : that1 this : that2 this : that3
## " and " " goes with " " is a long way from "
Without an add on package:
regmatches(txt, gregexpr("(?<=this).*?(?=that)", txt, perl=TRUE))
## > regmatches(txt, gregexpr("(?<=this).*?(?=that)", txt, perl=TRUE))
## [[1]]
## [1] " and " " goes with " " is a long way from "
Here's another rough attempt using strsplit
, though it can probably be refined further:
txtspl <- unlist(strsplit(gsub("[[:punct:]]","",txt),"this|that"))
txtspl[txtspl!=" "][-1]
#[1] " and " " goes with " " is a long way from "
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With