Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R extract items from string

Tags:

regex

r

I am trying to extract all words containing two adjacent vowels in this given string.

x <- "The team sat next to each other all year and still failed."

The results would be "team", "each", "year", "failed"

So far I have tried using [aeiou][aeiou] to do this with regmatches but it only gives me part of the word.

Thanks.

like image 315
jason.nash Avatar asked Nov 26 '25 20:11

jason.nash


2 Answers

You can place \w* before and after the character class to match "zero or more" word characters.

x <- "The team sat next to each other all year and still failed."
regmatches(x, gregexpr('\\w*[aeiou]{2}\\w*', x))[[1]]
# [1] "team"   "each"   "year"   "failed"
like image 200
hwnd Avatar answered Nov 28 '25 17:11

hwnd


words <-unlist(strsplit(x, " "))
words[grepl("[aeiou]{2}", words)]
#[1] "team"    "each"    "year"    "failed."

If you wanted to clean up the punctuatin it could be:

> words <-unlist(strsplit(x, "[[:punct:] ]"))
> words[grepl("[aeiou]{2}", words)]
like image 43
IRTFM Avatar answered Nov 28 '25 15:11

IRTFM