Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NLTK vs Stanford NLP

I have recently started to use NLTK toolkit for creating few solutions using Python.

I hear a lot of community activity regarding using Stanford NLP. Can anyone tell me the difference between NLTK and Stanford NLP? Are they two different libraries? I know that NLTK has an interface to Stanford NLP but can anyone throw some light on few basic differences or even more in detail.

Can Stanford NLP be used using Python?

like image 968
RData Avatar asked Oct 13 '16 03:10

RData


People also ask

What is the difference between NLP and NLTK?

Natural language processing (NLP) is a field that focuses on making natural human language usable by computer programs. NLTK, or Natural Language Toolkit, is a Python package that you can use for NLP. A lot of the data that you could be analyzing is unstructured data and contains human-readable text.

Which is better NLTK or spaCy?

While NLTK provides access to many algorithms to get something done, spaCy provides the best way to do it. It provides the fastest and most accurate syntactic analysis of any NLP library released to date. It also offers access to larger word vectors that are easier to customize.

What is Stanford NLP library?

CoreNLP enables users to derive linguistic annotations for text, including token and sentence boundaries, parts of speech, named entities, numeric and time values, dependency and constituency parses, coreference, sentiment, quote attributions, and relations.


2 Answers

Can anyone tell me what is the difference between NLTK and Stanford NLP? Are they 2 different libraries ? I know that NLTK has an interface to Stanford NLP but can anyone throw some light on few basic differences or even more in detail.

(I'm assuming you mean "Stanford CoreNLP".)

They are two different libraries.

  • Stanford CoreNLP is written in Java
  • NLTK is a Python library

The main functional difference is that NLTK has multiple versions or interfaces to other versions of NLP tools, while Stanford CoreNLP only has their version. NLTK also supports installing third-party Java projects, and even includes instructions for installing some Stanford NLP packages on the wiki.

Both have good support for English, but if you are dealing with other languages:

  • Stanford CoreNLP comes with models for English, Chinese, French, German, Spanish, and Arabic.
  • NLTK comes with corpora in additional languages like Portugese, Russian, and Polish. Individual tools may support even more languages (e.g. no Danish corpora, but has a DanishStemmer).

That said, which one is "best" will depend on your specific application and required performance (what features you are using, language, vocabulary, desired speed, etc.).

Can Stanford NLP be used using Python?

Yes, there are a number of interfaces and packages for using Stanford CoreNLP in Python (independent of NLTK).

like image 120
user812786 Avatar answered Sep 19 '22 05:09

user812786


The choice will depend upon your use case. NLTK is great for pre-processing and tokenizing text. It also includes a good POS tagger. Standford Core NLP for only tokenizing/POS tagging is a bit of overkill, because Standford NLP requires more resources.
But one fundamental difference is, you can't parse syntactic dependencies out of the box with NLTK. You need to specify a Grammar for that which can be very tedious if the text domain is not restricted. Whereas Standford NLP provides a probabilistic parser for general text as a down-loadable model, which is quite accurate. It also has built in NER (Named Entity Recognition) and more. Also I will recomend to take a look at Spacy, which is written in python, easy to use and much faster than CoreNLP.

like image 45
0x5050 Avatar answered Sep 19 '22 05:09

0x5050