I've been using NLTK for finding collocations, or n-grams, and have recently discovered the spaCy module for NLP. I've only just begun familiarizing myself with it and have, thus far, seen little mention for supported collocation functions.
Can spaCy be used to find collocations directly?
I have read through the documentation, but haven't seen mention.
Collocations are phrases or expressions containing multiple words, that are highly likely to co-occur. For example — 'social media', 'school holiday', 'machine learning', 'Universal Studios Singapore', etc.
spaCy is designed specifically for production use and helps you build applications that process and “understand” large volumes of text. It can be used to build information extraction or natural language understanding systems, or to pre-process text for deep learning.
While NLTK provides access to many algorithms to get something done, spaCy provides the best way to do it. It provides the fastest and most accurate syntactic analysis of any NLP library released to date. It also offers access to larger word vectors that are easier to customize.
How do you find collocations in text? A collocation is a sequence of words that occurs together unusually often. python has built-in func bigrams that returns word pairs. What's left is to find bigrams that occur more often based on the frequency of individual words.
Collocations detection also can be based on dependency parsing, but spaCy do not have support to do it. You can use spaCy as part of an approach, but not directly.
may you also consider gensim: https://radimrehurek.com/gensim/models/phrases.html
I hope it can help you
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With