I have a variable a
created by readLines
of a file which contains some emails. I already filtered only those rows whith the @ symbol, and now am struggling to grab the emails. The text in my variable looks like this:
> dput(a[1:5])
c("buenas tardes. excelente. por favor a: [email protected]",
"[email protected] ", "Aprecio tu aporte , mi correo es [email protected] , Muchas Gracias",
"gracias [email protected]", "Me apunto, muchas gracias mi dirección [email protected] me será de mucha utilidad. "
)
From this question in SO I got a starting point to extract the emails (@Aaron Haurun's answer), which slightly modified (I added a [\w.]
before the @
to address emails with .
between names) worked well in regex101.com to extract the emails. However, it fails when I port it to gsub
:
> gsub("()(\\w[\\w.]+@[\\w.-]+|\\{(?:\\w+, *)+\\w+\\}@[\\w.-]+)()",
"\\2",
a[1:5],
perl = FALSE) ## It doesn't matter if I use perl = TRUE
[1] "buenas tardes. excelente. por favor a: [email protected]" "[email protected] "
[3] "Aprecio tu aporte , mi correo es [email protected] , Muchas Gracias" "gracias [email protected]"
[5] "Me apunto, muchas gracias mi dirección [email protected] me será de mucha utilidad. "
What am I doing wrong and how can I grab those emails? Thanks!
We can try the str_extract()
from stringr
package:
str_extract(text, "\\S*@\\S*")
[1] "[email protected]"
[2] "[email protected]"
[3] "[email protected]"
[4] "[email protected]"
[5] "[email protected]"
where \\S*
match any number of non-space character.
From the answer you posted in your question,
library(stringr)
str_extract(a, '\\S+@\\S+|\\{(?:\\w+, *)+\\w+\\}@[\\w.-]+')
#[1] "[email protected]" "[email protected]" "[email protected]" "[email protected]"
#[5] "[email protected]"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With