Natural language processing [closed]

Tags:

programming-languages

Question is maybe ( about 100%) subjective but I need advices. What is best language for natural language processing ? I know Java and C++ but is there easier way to do it. To be more specific I need to process texts from lot of sites and get information.

706

asked Nov 06 '10 22:11

Damir

2 Answers

As I said in comments, the question is not about a language, but about suitable library. And there are a lot of NLP libraries in both Java and C++. I believe you must inspect some of them (in both languages) and then, when you will know all the plenty of available libraries, create some kind of "big plan", how to implement your task. So, here I'll just give you some links with a brief explanation what is what.

Java

GATE - it is exactly what its name means - General Architecture for Text Processing. Application in GATE is a pipeline. You put language processing resources like tokenizers, POS-taggers, morphological analyzers, etc. on it and run the process. The result is represented as a set of annotations - meta information, attached to a peace of text (e.g. token). In addition to great number of plugins (including plugins for integration with other NLP resources like WordNet or Stanford Parser), it has many predefined dictionaries (cities, names, etc.) and its own regex-like language JAPE. GATE comes with its own IDE (GATE Developer), where you can try your pipeline setup, and then save it and load from Java code.

UIMA - or Unstructured Information Management Applications. It is very similar to GATE in terms of architecture. It also represents pipeline and produces set of annotations. Like GATE, it has visual IDE, where you can try out your future application. The difference is that UIMA mostly concerns information extraction while GATE performs text processing without explicit consideration of its purpose. Also UIMA comes with simple REST server.

OpenNLP - they call themselves organization center for open source projects on NLP, and this is the most appropriate definition. Main direction of development is to use machine learning algorithms for the most general NLP tasks like part-of-speech tagging, named entity recognition, coreference resolution and so on. It also has good integration with UIMA, so its tools are also available.

Stanford NLP - probably best choice for engineers and researchers with NLP and ML knowledge. Unlike libraries like GATE and UIMA, it doesn't aim to provide as much tools as possible, but instead concentrates on idiomatic models. E.g. you don't have comprehensive dictionaries, but you can train probabilistic algorithm to create it! In addition to its CoreNLP component, that provides most wildly used tools like tokenization, POS tagging, NER, etc., it has several very interesting subprojects. E.g. their Dependency framework allows you to extract complete sentence structure. That is, you can, for example, easily extract information about subject and object of a verb in question, which is much harder using other NLP tools.

C++

UIMA - yes, there are complete implementations for both Java and C++.

Stanford Parser - some Stanford's projects are only in Java, others - only in C++, and some of them are available in both languages. You can find many of them here.

APIs

A number of web service APIs perform specific language processing, including:

Alchemy API - language identification, named entity recognition, sentiment analysis and much more! Take a look at their main page - it is quite self-descriptive.

OpenCalais - this service tries to build giant graph of everything. You pass it a web page URL and it enriches this page text with found entities, together with relations between them. For example, you pass it a page with "Steve Jobs" and it returns "Apple Inc." (roughly speaking) together with probability that this is the same Steve Jobs.

Other recommendations

And yes, you should definitely take a look at Python's NLTK. It is not only a powerful and easy-to-use NLP library, but also a part of excellent scientific stack created by extremely friendly community.

Update (2017-11-15): 7 years later there are even more impressive tools, cool algorithms and interesting tasks. One comprehensive description may be found here:

https://tomassetti.me/guide-natural-language-processing/

161

answered Oct 13 '22 01:10

ffriend

Python and NLTK

answered Oct 13 '22 01:10

pmav99

Related questions
                            
                                What makes Julia unsuitable for general purpose programming? [closed]
                            
                                Is there a programming language with better approach for switch's break statements?
                            
                                Generics and Constrained Polymorphism versus Subtyping
                            
                                What languages have higher levels of abstraction and require less manual memory management than C++?
                            
                                A category of type-changing substitutions
                            
                                How does one avoid creating an ad-hoc type system in dynamically typed languages?
                            
                                Language requirements for AI development [duplicate]
                            
                                What is "mumble typing?"
                            
                                What's the best programming language for learning machine learning? [closed]
                            
                                Other programming languages that support implicits "a la Scala"
                            
                                How to match multiple words in regex
                            
                                Why is it called "open (or closed) recursion?
                            
                                Python generators in various languages [closed]
                            
                                How can I create an empty array in objective C, and assign value into it one by one?
                            
                                Languages used to write programs for satellite-missions?
                            
                                C# How to loop through Properties.Settings.Default.Properties changing the values
                            
                                Why do almost all OO languages compile to bytecode?
                            
                                Why do .net languages vary in performance?
                            
                                Union types and Intersection types
                            
                                What is call-by-need?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With