I'm would like to find out what text comes after the instance that concordace returns. So for instance, if you look at an example they give in 'Searching Text' section, they get concordance of word 'monstrous'. How would you get words that come right after an instance of monstrous?
text module. This module brings together a variety of NLTK functionality for text analysis, and provides simple, interactive interfaces. Functionality includes: concordancing, collocation discovery, regular expression search over tokenized strings, and distributional similarity.
A concordance view shows us every occurrence of a given word, together with some context.
Context 1. ... Concordance: this function lists each instance of a word in the text and displays a list of sentences where it is present, see figure 5.
import nltk
import nltk.book as book
text1 = book.text1
c = nltk.ConcordanceIndex(text1.tokens, key = lambda s: s.lower())
print([text1.tokens[offset+1] for offset in c.offsets('monstrous')])
yields
['size', 'bulk', 'clubs', 'cannibal', 'and', 'fable', 'Pictures', 'pictures', 'stories', 'cabinet', 'size']
I found this by looking up how the concordance
method is defined.
This shows text1.concordance
is defined in /usr/lib/python2.7/dist-packages/nltk/text.py
:
In [107]: text1.concordance?
Type: instancemethod
Base Class: <type 'instancemethod'>
String Form: <bound method Text.concordance of <Text: Moby Dick by Herman Melville 1851>>
Namespace: Interactive
File: /usr/lib/python2.7/dist-packages/nltk/text.py
In that file you'll find
def concordance(self, word, width=79, lines=25):
...
self._concordance_index = ConcordanceIndex(self.tokens,
key=lambda s:s.lower())
...
self._concordance_index.print_concordance(word, width, lines)
This shows how to instantiate ConcordanceIndex
objects.
And in the same file you'll also find:
class ConcordanceIndex(object):
def __init__(self, tokens, key=lambda x:x):
...
def print_concordance(self, word, width=75, lines=25):
...
offsets = self.offsets(word)
...
right = ' '.join(self._tokens[i+1:i+context])
With some experimentation in the IPython interpreter, this shows self.offsets('monstrous')
gives a list of numbers (offsets) where the word monstrous
can be found. You can access the actual words with self._tokens[offset]
, which is the same as text1.tokens[offset]
.
So the next word after monstrous
is given by text1.tokens[offset+1]
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With