Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matching a word after another word in R regex

Tags:

regex

r

gsub

I have a dataframe in R with one column (called 'city') containing a text string. My goal is to extract only one word ie the city text from the text string. The city text always follows the word 'in', eg the text might be:

'in London'
'in Manchester'

I tried to create a new column ('municipality'):

df$municipality <- gsub(".*in ?([A-Z+).*$","\\1",df$city)

This gives me the first letter following 'in', but I need the next word (ONLY the next word)

I then tried:

gsub(".*in ?([A-Z]\w+))")

which worked on a regex checker, but not in R. Can someone please help me. I know this is probably very simple but I can't crack it. Thanks in advance.

like image 760
RichS Avatar asked Jan 15 '16 05:01

RichS


2 Answers

We can use str_extract

library(stringr)
str_extract(df$city, '(?<=in\\s)\\w+')
#[1] "London"     "Manchester"
like image 145
akrun Avatar answered Nov 15 '22 05:11

akrun


The following regular expression will match the second word from your city column:

^in\\s([^ ]*).*$

This matches the word in followed a single space, followed by a capture group of any non space characters, which comprises the city name.

Example:

df <- data.frame(city=c("in London town", "in Manchester city"))

df$municipality <- gsub("^in\\s([^ ]*).*$", "\\1", df$city)

> df$municipality
[1] "London"     "Manchester"
like image 20
Tim Biegeleisen Avatar answered Nov 15 '22 03:11

Tim Biegeleisen