I want to extract the tags (twitter handles) from tweets.
tweet <- "@me bla bla bla bla @him some text @her"
With:
at <- regexpr('@[[:alnum:]]*', tweet)
handle <- substr(tweet,at+1,at+attr(at,"match.length")-1)
I successfully extract the first handle
handle
[1] "me"
However I am unable to find a way to extract the others, does anyone know a way to do this? - Thanks
library(stringr)
str_extract_all(tweet,perl("(?<=@)\\w+"))[[1]]
#[1] "me" "him" "her"
Or using stringi
for fast processing
library(stringi)
stri_extract_all_regex(tweet, "(?<=@)\\w+")[[1]]
#[1] "me" "him" "her"
tweet1 <- rep(tweet, 1e5)
f1 <- function() {m <- regmatches(tweet1, gregexpr("@[a-z]+", tweet1))[[1]]
substring(m, 2)}
f2 <- function() {stri_extract_all_regex(tweet1, "(?<=@)\\w+")[[1]]}
f3 <- function() {regmatches(tweet1, gregexpr("(?<=@)[a-z]+", tweet1,perl=T))}
library(microbenchmark)
microbenchmark(f1(), f2(), f3(), unit="relative")
#Unit: relative
# expr min lq median uq max neval
#f1() 5.387274 5.253141 5.143694 5.166854 4.544567 100
#f2() 1.000000 1.000000 1.000000 1.000000 1.000000 100
#f3() 5.523090 5.440423 5.301971 5.335775 4.721337 100
I would suggest:
tweet <- "@me bla bla bla bla @him some text @her"
regmatches(tweet, gregexpr("(?<=@)[a-z]+", tweet,perl=T))
## [[1]]
## [1] "me" "him" "her"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With