I'm working on a Twitter dataset in R and I'm finding it difficult to remove usernames from tweets.
This is an example of the tweets in the tweet column of my dataset:
[1] "@danimottale: 2 bad our inalienable rights offend their sensitivities. U cannot reason with obtuse zealotry. // So very well said."
[2] "@FreeMktMonkey @drleegross Want to build HSA throughout lifetime for when older thus need HDHP not to deplete it if ill before 65y/o.thanks"
I want to remove/replace all words starting with "@" to get this output:
[1] "2 bad our inalienable rights offend their sensitivities. U cannot reason with obtuse zealotry. // So very well said."
[2] "Want to build HSA throughout lifetime for when older thus need HDHP not to deplete it if ill before 65y/o.thanks"
This gsub function works for just removing the "@" symbol.
gsub("@", "", tweetdata$tweets)
I want to say, remove characters following text symbol until you encounter a space or punctuation mark.
I started trying to just deal with space but to no avail:
gsub("@.*[:space:]$", "", tweetdata$tweets)
this removes the second tweet entirely
gsub("@.*[:blank:]$", "", tweetdata$tweets)
this doesn't change the output.
I will be grateful for your help.
You can use the following. \S+
matches any non-whitespace character (1
or more times), followed by matching a single whitespace character.
gsub('@\\S+\\s', '', noRT$text)
Working Demo
EDIT: A negated match would work fine also (using just the space character)
gsub('@[^ ]+ ', '', noRT$text)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With