Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python module to remove internet jargon/slang/acronym

Is there any python module (may be in nltk python) to remove internet slang/ chat slang like "lol","brb" etc. If not can some one provide me a CSV file comprising of such vast list of slang?

The website http://www.netlingo.com/acronyms.php gives the list of acronyms but I am not able to find any CSV files for using them in my program.

like image 858
Rkz Avatar asked Dec 14 '11 09:12

Rkz


1 Answers

code to scrap http://www.netlingo.com/acronyms.php

from bs4 import BeautifulSoup
import requests, json
resp = requests.get("http://www.netlingo.com/acronyms.php")
soup = BeautifulSoup(resp.text, "html.parser")
slangdict= {}
key=""
value=""
for div in soup.findAll('div', attrs={'class':'list_box3'}):
    for li in div.findAll('li'):
        for a in li.findAll('a'):
            key =a.text
            value = li.text.split(key)[1]
            slangdict[key]=value

with open('myslang.json', 'w') as f:
    json.dump(slangdict, f, indent=2)
like image 131
CKM Avatar answered Sep 24 '22 03:09

CKM