Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python's NLTK vs. related Java Libraries? [closed]

I've used LingPipe, Stanford's NER, RiTa and various sentence similarity libraries for my previous Java projects that focused on text (pre)processing (indexing, xml tagging, topic detection, etc.) of large amounts of English text (around 10,000 documents summing to > 1gb of text). Maybe I'm a bad Java programmer, but I find myself typing a lot of code and using a lot of libraries when I switch to a different corpus. Overall, I feel like there might be a better tool for the job.

I guess my question is, will I benefit from switching to Python and NLTK for information retrieval / language processing? Or are there enough pros and cons to make it very subjective? Is NLTK intuitive enough to be learned quickly?

I'd get my hands dirty, but I won't have access to a personal machine for the next few days.

like image 548
wnewport Avatar asked Apr 08 '11 01:04

wnewport


2 Answers

NLTK is good for natural language processing. I've used it for my data-mining project. You can train your own analyzer. The learning curve is not steep.

NLTK got huge corpus for training of your analyzer. You can also provide your own set of data, for example, a journal which a part-of-speech tagged.

Because python is very good for text processing, you may to give it a try. Plus, it got a online tutorial

Please don't forget to use python 2.x version. Try python 2.6. NLTK may not be good with python 3.x

like image 66
lamwaiman1988 Avatar answered Nov 04 '22 12:11

lamwaiman1988


If you already understand the basics of NLP, I think NLTK should be pretty easy to pick up. It's got a bunch of documentation, 2 books, and I've written a number of articles & tutorials on streamhacker.com. And if there's anything from the Java packages you don't want to lose, you could theoretically combine it with NLTK using Jython (and perhaps execnet).

You also may want to take a look at the Pattern library.

like image 37
Jacob Avatar answered Nov 04 '22 10:11

Jacob