How do I extract text between two characters in R

Question

I'd like to extract text between two strings for all occurrences of a pattern. For example, I have this string:

x<- "
TYPE:    School
CITY:   ATLANTA


CITY:   LAS VEGAS

"

I'd like to extract the words ATLANTA and LAS VEGAS as such:

[1] "ATLANTA"   "LAS VEGAS"

I tried using gsub(".*CITY:\s| ","",x). The output this yields is:

[1] "  LAS VEGAS"

I would like to output both cities (some patterns in the data include more than 2 cities) and to output them without the leading space.
I also tried the qdapRegex package but could not get close. I am not that good with regular expressions so help would be much appreciated.

Wiktor Stribiżew · Accepted Answer

You may use

> unlist(regmatches(x, gregexpr("CITY:\s*\K.*", x, perl=TRUE)))
[1] "ATLANTA"   "LAS VEGAS"

Here, CITY:\s*\K.* regex matches

CITY: - a literal substring CITY:
\s* - 0+ whitespaces
\K - match reset operator that discards the text matched so far (zeros the current match memory buffer)
.* - any 0+ chars other than line break chars, as many as possible.

See the regex demo online.

Note that since it is a PCRE regex, perl=TRUE is indispensible.

How do I extract text between two characters in R

Tags:

string

regex

r

Cordy

1 Answers

Wiktor Stribiżew

Recent Activity

Donate For Us

How do I extract text between two characters in R

Tags:

string

regex

r

Cordy

1 Answers

Wiktor Stribiżew

Related questions

Recent Activity

Donate For Us