Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

r code removing words containing @

Tags:

regex

r

gsub

I want to replace all words containing the symbol @ with a specific word. I am used gsub and therefore am applying it to a character vector. The issue that keeps occuring is that when I use:

gsub(".*@.*", "email", data) 

all of the text in that portion of the character vector gets deleted.

There are multiple different emails all with different lengths so I can't set the characters prior and characters after to a specific number.

Any suggestions?

I've done my fair share of reading about regex but everything I tried failed.

Here's an example:

data <- c("This is an example. Here is my email: [email protected]. Thank you")

data <- gsub(".*@.*", "email", data)

it returns [1] "email"

when I want [1] "This is an example. Here is my email: email. Thank you"

like image 688
user3772674 Avatar asked Feb 12 '23 16:02

user3772674


1 Answers

You can use the following..

gsub('\\S+@\\S+', 'email', data)

Explanation:

\S matches any non-whitespace character. So here we match for any non-whitespace character (1 or more times) preceded by @ followed by any non-whitespace character (1 or more times)

like image 88
hwnd Avatar answered Feb 15 '23 09:02

hwnd