Calling NLTK's concordance - how to get text before/after a word that was used?

Tags:

I'm would like to find out what text comes after the instance that concordace returns. So for instance, if you look at an example they give in 'Searching Text' section, they get concordance of word 'monstrous'. How would you get words that come right after an instance of monstrous?

913

asked Jan 17 '12 16:01

dev.e.loper

1 Answers

import nltk
import nltk.book as book
text1 = book.text1
c = nltk.ConcordanceIndex(text1.tokens, key = lambda s: s.lower())
print([text1.tokens[offset+1] for offset in c.offsets('monstrous')])

yields

['size', 'bulk', 'clubs', 'cannibal', 'and', 'fable', 'Pictures', 'pictures', 'stories', 'cabinet', 'size']

I found this by looking up how the concordance method is defined.

This shows text1.concordance is defined in /usr/lib/python2.7/dist-packages/nltk/text.py:

In [107]: text1.concordance?
Type:       instancemethod
Base Class: <type 'instancemethod'>
String Form:    <bound method Text.concordance of <Text: Moby Dick by Herman Melville 1851>>
Namespace:  Interactive
File:       /usr/lib/python2.7/dist-packages/nltk/text.py

In that file you'll find

def concordance(self, word, width=79, lines=25):
    ... 
        self._concordance_index = ConcordanceIndex(self.tokens,
                                                   key=lambda s:s.lower())
    ...            
    self._concordance_index.print_concordance(word, width, lines)

This shows how to instantiate ConcordanceIndex objects.

And in the same file you'll also find:

class ConcordanceIndex(object):
    def __init__(self, tokens, key=lambda x:x):
        ...
    def print_concordance(self, word, width=75, lines=25):
        ...
        offsets = self.offsets(word)
        ...
        right = ' '.join(self._tokens[i+1:i+context])

With some experimentation in the IPython interpreter, this shows self.offsets('monstrous') gives a list of numbers (offsets) where the word monstrous can be found. You can access the actual words with self._tokens[offset], which is the same as text1.tokens[offset].

So the next word after monstrous is given by text1.tokens[offset+1].

answered Sep 28 '22 00:09

unutbu

Related questions
                            
                                Pyglet OpenGL drawing anti-aliasing
                            
                                Using @property decorator on dicts
                            
                                how to add dozen of test cases to a test suite automatically in python
                            
                                Is there a JavaScript or jQuery equivalent to Python's "sum" built-in function?
                            
                                Cumulative summation of a numpy array by index
                            
                                Error while working with excel using python
                            
                                Python equivalent of Ruby's each_slice(count)
                            
                                Determine Index of Highest Value in Python's NumPy
                            
                                Python PyGILState_{Ensure/Release} causes segfault while returning to C++ from Python code
                            
                                How do I update the python lib boto?
                            
                                Non-consuming regular expression split in Python
                            
                                Shift all indices in NumPy array
                            
                                Jinja2: Render template inheritance?
                            
                                Configuring Fabric For EC2
                            
                                How to access a python module variable using a string [ django ]
                            
                                404 on requests without trailing slash to i18n urls
                            
                                Does realloc actually shrink buffers in common implementations?
                            
                                Best way to send email with Python on Mac or Linux?
                            
                                How do I create multiple checkboxes from a list in a for loop in python tkinter
                            
                                3d plotting with python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Calling NLTK's concordance - how to get text before/after a word that was used?

Tags:

python

nltk

dev.e.loper

People also ask

1 Answers

unutbu

Recent Activity

Donate For Us