Matching a word after another word in R regex

Question

I have a dataframe in R with one column (called 'city') containing a text string. My goal is to extract only one word ie the city text from the text string. The city text always follows the word 'in', eg the text might be:

'in London'
'in Manchester'

I tried to create a new column ('municipality'):

df$municipality <- gsub(".*in ?([A-Z+).*$","\1",df$city)

This gives me the first letter following 'in', but I need the next word (ONLY the next word)

I then tried:

gsub(".*in ?([A-Z]\w+))")

which worked on a regex checker, but not in R. Can someone please help me. I know this is probably very simple but I can't crack it. Thanks in advance.

akrun · Accepted Answer

We can use str_extract

library(stringr)
str_extract(df$city, '(?<=in\s)\w+')
#[1] "London"     "Manchester"

Tim Biegeleisen · Answer

The following regular expression will match the second word from your city column:

^in\s([^ ]*).*$

This matches the word in followed a single space, followed by a capture group of any non space characters, which comprises the city name.

Example:

df <- data.frame(city=c("in London town", "in Manchester city"))

df$municipality <- gsub("^in\s([^ ]*).*$", "\1", df$city)

> df$municipality
[1] "London"     "Manchester"

Matching a word after another word in R regex

Tags:

regex

r

gsub

RichS

2 Answers

akrun

Tim Biegeleisen

Recent Activity

Donate For Us

Matching a word after another word in R regex

Tags:

regex

r

gsub

RichS

2 Answers

akrun

Tim Biegeleisen

Related questions

Recent Activity

Donate For Us