I have a large amount of text in Mysql tables. I want to do some statistical analysis and later on some NLP on my text using the NLTK toolkit. I have two choices:
The latter seems quite complicated and I haven't found any articles that actually describes how to use it I only found this: Creating a MongoDB backed corpus reader which uses MongoDB as its database and the code is quite complicated and also requires knowing MongoDB. On the other hand, the former seems really straightforward but results in an overhead extracting the texts from DB.
Now the question is that what are the advantages of corpus in NLTK? In other words, if I take the challenge and dig into overwriting NTLK methods so it can read from MySQL database, would it be worth the hassle? Does turning my text into a corpus give me something that I cannot (or with a lot of difficulty) do with ordinary NLTK functions?
Also if you know something about connecting MySQL to NLTK please let me know. Thanks
Well after reading a lot I found out the answer.
There are several very useful functions such as collocations,search,common_context,similar that can be used on texts that are saved as corpus in NLTK. implementing them yourself takes quite some time. If Select my text from the database and put in a file and use the nltk.Text
function then I can use all the functions that I mentioned before without the need of writing so many lines of code or even overwriting methods so that I can connect to MySql.Here is the link for more info: nltk.Text
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With