Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Collocations with spaCy

Tags:

python

nlp

spacy

I've been using NLTK for finding collocations, or n-grams, and have recently discovered the spaCy module for NLP. I've only just begun familiarizing myself with it and have, thus far, seen little mention for supported collocation functions.

Can spaCy be used to find collocations directly?

I have read through the documentation, but haven't seen mention.

like image 941
alphazwest Avatar asked Aug 31 '16 20:08

alphazwest


People also ask

What are collocations in NLP?

Collocations are phrases or expressions containing multiple words, that are highly likely to co-occur. For example — 'social media', 'school holiday', 'machine learning', 'Universal Studios Singapore', etc.

What can you do with spaCy?

spaCy is designed specifically for production use and helps you build applications that process and “understand” large volumes of text. It can be used to build information extraction or natural language understanding systems, or to pre-process text for deep learning.

Which is better NLTK or spaCy?

While NLTK provides access to many algorithms to get something done, spaCy provides the best way to do it. It provides the fastest and most accurate syntactic analysis of any NLP library released to date. It also offers access to larger word vectors that are easier to customize.

How do you use collocations in Python?

How do you find collocations in text? A collocation is a sequence of words that occurs together unusually often. python has built-in func bigrams that returns word pairs. What's left is to find bigrams that occur more often based on the frequency of individual words.


1 Answers

Collocations detection also can be based on dependency parsing, but spaCy do not have support to do it. You can use spaCy as part of an approach, but not directly.

may you also consider gensim: https://radimrehurek.com/gensim/models/phrases.html

I hope it can help you

like image 131
Vinicius Woloszyn Avatar answered Oct 05 '22 14:10

Vinicius Woloszyn