NLTK collocations for specific words

Question

I know how to get bigram and trigram collocations using NLTK and I apply them to my own corpora. The code is below.

I'm not sure however about (1) how to get the collocations for a particular word? (2) does NLTK have a collocation metric based on Log-Likelihood Ratio?

import nltk
from nltk.collocations import *
from nltk.tokenize import word_tokenize

text = "this is a foo bar bar black sheep  foo bar bar black sheep foo bar bar black  sheep shep bar bar black sentence"

trigram_measures = nltk.collocations.TrigramAssocMeasures()
finder = TrigramCollocationFinder.from_words(word_tokenize(text))

for i in finder.score_ngrams(trigram_measures.pmi):
    print i

bogs · Accepted Answer

Try this code:

import nltk
from nltk.collocations import *
bigram_measures = nltk.collocations.BigramAssocMeasures()
trigram_measures = nltk.collocations.TrigramAssocMeasures()

# Ngrams with 'creature' as a member
creature_filter = lambda *w: 'creature' not in w


## Bigrams
finder = BigramCollocationFinder.from_words(
   nltk.corpus.genesis.words('english-web.txt'))
# only bigrams that appear 3+ times
finder.apply_freq_filter(3)
# only bigrams that contain 'creature'
finder.apply_ngram_filter(creature_filter)
# return the 10 n-grams with the highest PMI
print finder.nbest(bigram_measures.likelihood_ratio, 10)


## Trigrams
finder = TrigramCollocationFinder.from_words(
   nltk.corpus.genesis.words('english-web.txt'))
# only trigrams that appear 3+ times
finder.apply_freq_filter(3)
# only trigrams that contain 'creature'
finder.apply_ngram_filter(creature_filter)
# return the 10 n-grams with the highest PMI
print finder.nbest(trigram_measures.likelihood_ratio, 10)

It uses the likelihood measure and also filters out Ngrams that don't contain the word 'creature'

dmvianna · Answer

Question 1 - Try:

target_word = "electronic" # your choice of word
finder.apply_ngram_filter(lambda w1, w2, w3: target_word not in (w1, w2, w3))
for i in finder.score_ngrams(trigram_measures.likelihood_ratio):
print i

The idea is to filter out whatever you don't want. This method is normally used to filter out words in specific parts of the ngram, and you can tweak that to your heart's content.

NLTK collocations for specific words

Tags:

python

nltk

collocation

Sabba

2 Answers

bogs

dmvianna

Recent Activity

Donate For Us

NLTK collocations for specific words

Tags:

python

nltk

collocation

Sabba

2 Answers

bogs

dmvianna

Related questions

Recent Activity

Donate For Us