I'm looking to do some sentence analysis (mostly for twitter apps) and infer some general characteristics. Are there any good natural language processing libraries for this sort of thing in Ruby?
Similar to Is there a good natural language processing library but for Ruby. I'd prefer something very general, but any leads are appreciated!
Even though NLP has grown significantly since its humble beginnings, industry experts say that its implementation still remains one of the biggest big data challenges of 2021.
Natural language processing (NLP) describes the interaction between human language and computers. It's a technology that many people use daily and has been around for years, but is often taken for granted. A few examples of NLP that people use every day are: Spell check. Autocomplete.
Scikit-Learn It is a great open so natural language processing library and most used among data scientists for NLP tasks. It provides a large number of algorithms to build machine learning models.
In natural language processing, human language is separated into fragments so that the grammatical structure of sentences and the meaning of words can be analyzed and understood in context. This helps computers read and understand spoken or written text in the same way as humans.
Three excellent and mature NLP packages are Stanford Core NLP, Open NLP and LingPipe. There are Ruby bindings to the Stanford Core NLP tools (GPL license) as well as the OpenNLP tools (Apache License).
On the more experimental side of things, I maintain a Text Retrieval, Extraction and Annotation Toolkit (Treat), released under the GPL, that provides a common API for almost every NLP-related gem that exists for Ruby. The following list of Treat's features can also serve as a good reference in terms of stable natural language processing gems compatible with Ruby 1.9.
punkt-segmenter
, tactful_tokenizer
, srx-english
, scalpel
)stanford-core-nlp
).linguistics
), stemming (ruby-stemmer
, uea-stemmer
, lingua
, etc.)rwordnet
), POS taggers (rbtagger
, engtagger
, etc.)whatlanguage
), date/time (chronic
, kronic
, nickel
), keyword (lda-ruby
) extraction.ferret
).stanford-core-nlp
).decisiontree
), MLPs (ruby-fann
), SVMs (rb-libsvm
) and linear classification (tomz-liblinear-ruby-swig
).levenshtein-ffi
, fuzzy-string-match
, tf-idf-similarity
).Not included in Treat, but relevant to NLP: hotwater (string distance algorithms), yomu (binders to Apache Tiki for reading .doc, .docx, .pages, .odt, .rtf, .pdf), graph-rank (an implementation of GraphRank).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With