I have a dataframe in R with one column (called 'city') containing a text string. My goal is to extract only one word ie the city text from the text string. The city text always follows the word 'in', eg the text might be:
'in London'
'in Manchester'
I tried to create a new column ('municipality'):
df$municipality <- gsub(".*in ?([A-Z+).*$","\\1",df$city)
This gives me the first letter following 'in', but I need the next word (ONLY the next word)
I then tried:
gsub(".*in ?([A-Z]\w+))")
which worked on a regex checker, but not in R. Can someone please help me. I know this is probably very simple but I can't crack it. Thanks in advance.
We can use str_extract
library(stringr)
str_extract(df$city, '(?<=in\\s)\\w+')
#[1] "London" "Manchester"
The following regular expression will match the second word from your city
column:
^in\\s([^ ]*).*$
This matches the word in
followed a single space, followed by a capture group of any non space characters, which comprises the city name.
Example:
df <- data.frame(city=c("in London town", "in Manchester city"))
df$municipality <- gsub("^in\\s([^ ]*).*$", "\\1", df$city)
> df$municipality
[1] "London" "Manchester"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With