How do I use the book functions (e.g. concoordance) in NLTK?

Tags:

I am going through this wonderful tutorial.

I downloaded a collection called book:

>>> import nltk
>>> nltk.download()

and imported texts:

>>> from nltk.book import *
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811

I can then run commands on these texts:

>>> text1.concordance("monstrous")

How can I run these nltk commands on my own dataset? Are these collections the same as the object book in python?

651

asked Jul 18 '13 21:07

Alex Gordon

2 Answers

You're right that it's quite hard to find the documentation for the book.py module. So we have to get our hands dirty and look at the code, (see here). Looking at the book.py, to do the conoordance and all the fancy stuff with the book module:

Firstly you have to have your raw texts put into nltk's corpus class, see Creating a new corpus with NLTK for more details.

Secondly you read the corpus words into the NLTK's Text class. Then you could use the functions that you see in http://nltk.org/book/ch01.html

from nltk.corpus import PlaintextCorpusReader
from nltk.text import Text

# For example, I create an example text file
text1 = '''
This is a story about a foo bar. Foo likes to go to the bar and his last name is also bar. At home, he kept a lot of gold chocolate bars.
'''
text2 = '''
One day, foo went to the bar in his neighborhood and was shot down by a sheep, a blah blah black sheep.
'''
# Creating the corpus
corpusdir = './mycorpus/' 
with (corpusdir+'text1.txt','w') as fout:
    fout.write(text1)
with (corpusdir+'text2.txt','w') as fout:
    fout.write(text2, fout)

# Read the the example corpus into NLTK's corpus class.
mycorpus = PlaintextCorpusReader(corpusdir, '.*')

# Read the NLTK's corpus into NLTK's text class, 
# where your book-like concoordance search is available
mytext = Text(mycorpus.words())

mytext.concoordance('foo')

NOTE: you can use other NLTK's CorpusReaders and even specify custom paragraph/sentence/word tokenizers and encoding but now, we'll stick to the default

121

answered Nov 02 '22 16:11

alvas

Text Analysis with NLTK Cheatsheet from bogs.princeton.edu https://blogs.princeton.edu/etc/files/2014/03/Text-Analysis-with-NLTK-Cheatsheet.pdf

Working with your own texts:

Open a file for reading

file = open('myfile.txt')

Make sure you are in the correct directory before starting Python - or give the full path specification.

Read the file:

t = file.read()

Tokenize the text:

tokens = nltk.word_tokenize(t)

Convert to NLTK Text object:

text = nltk.Text(tokens)

answered Nov 02 '22 18:11

C2Infinity

Related questions
                            
                                Using constants wisely in SymPy
                            
                                Source shell script and access exported variables from os.environ
                            
                                Django: get subset of a model with at least one related model
                            
                                `pyparsing`: iterating over `ParsedResults`
                            
                                Howto download file from Drive API using Python script
                            
                                urllib2.urlopen will hang forever despite of timeout
                            
                                Python and proxy - urllib2.URLError: <urlopen error [Errno 110] Connection timed out>
                            
                                Is there an up-to-date fast YAML parser with python bindings?
                            
                                What is the most efficient way to insert nodes into a neo4j database using cypher
                            
                                How can I change my tor process' endpoint in stem?
                            
                                Tkinter .after method freezing window?
                            
                                python - Using argparse, pass an arbitrary string as an argument to be used in the script
                            
                                XORing file with multi-byte key
                            
                                How to print raw html string using urllib3？
                            
                                Best way to compare two large sets of strings in Python
                            
                                Send/receive Packets with TCP sockets
                            
                                Why does Django's User Model set the email field as non-unique? [duplicate]
                            
                                Gunicorn and django settings module
                            
                                Python GUI programming using drag and drop, also incorporating stdout redirect
                            
                                Is it possible for a MongoDB connection to timeout in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I use the book functions (e.g. concoordance) in NLTK?

Tags:

python

nlp

nltk

Alex Gordon

People also ask

2 Answers

alvas

C2Infinity

Recent Activity

Donate For Us