I want to do some mining on tweets. Is there any more specific stop word list for tweets such as removing "lol" and other twitter smiley?
Click More from the side navigation menu, then click Settings and privacy. Click the Privacy and safety tab, then click Mute and block. Click Muted words. Click the word or hashtag you'd like to edit or unmute.
Mute words or phrases on an Android phoneSelect your icon on the upper left corner. Select “Settings and privacy” > “Privacy and safety” > “Mute and block.” Tap “Muted words.” Tap the plus sign and enter the word you want to mute.
With the Twitter mobile app open, tap on your profile photo in the upper left-hand corner. In the menu that appears, scroll toward the bottom of the list and tap on “Settings and privacy.” In the “Settings” menu, find and tap on “Privacy and Safety.” Tap on “Mute and block.”
To mute keywords on Twitter, simply press the “more” button on the Twitter website, select “Settings and Privacy,” head to the “Privacy and Safety” tab, and select “Mute and Block.” You can then choose which words you want to mute.
I guess you should merge ordinary stop word list, like this one or that, with the specific acronyms dictionary, e.g. this slang dictionary, or that, or that, or that (the last one seems to be the easiest for parsing, see comments here for the idea).
I'm not aware of a specific stopwords list, but you could get a list of most frequent single words here: http://clic.cimec.unitn.it/amac/twitter_ngram/ (download en.1grams.gz)
To detect and then ignore smilies use: https://github.com/brendano/tweetmotif
You may also find these tools useful: https://github.com/willf/segment (if you want to segment hashtags) https://github.com/amacinho/Rovereto-Twitter-Tokenizer (if you don't)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With