Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I use the book functions (e.g. concoordance) in NLTK?

Tags:

python

nlp

nltk

I am going through this wonderful tutorial.

I downloaded a collection called book:

>>> import nltk
>>> nltk.download()

and imported texts:

>>> from nltk.book import *
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811

I can then run commands on these texts:

>>> text1.concordance("monstrous")

How can I run these nltk commands on my own dataset? Are these collections the same as the object book in python?

like image 651
Alex Gordon Avatar asked Jul 18 '13 21:07

Alex Gordon


People also ask

What is concordance NLTK?

A concordance view shows us every occurrence of a given word, together with some context.

How do I access Brown Corpus NLTK?

We can access the corpus as a list of words, or a list of sentences (where each sentence is itself just a list of words). We can optionally specify particular categories or files to read: >>> from nltk. corpus import brown >>> brown.


2 Answers

You're right that it's quite hard to find the documentation for the book.py module. So we have to get our hands dirty and look at the code, (see here). Looking at the book.py, to do the conoordance and all the fancy stuff with the book module:

Firstly you have to have your raw texts put into nltk's corpus class, see Creating a new corpus with NLTK for more details.

Secondly you read the corpus words into the NLTK's Text class. Then you could use the functions that you see in http://nltk.org/book/ch01.html

from nltk.corpus import PlaintextCorpusReader
from nltk.text import Text

# For example, I create an example text file
text1 = '''
This is a story about a foo bar. Foo likes to go to the bar and his last name is also bar. At home, he kept a lot of gold chocolate bars.
'''
text2 = '''
One day, foo went to the bar in his neighborhood and was shot down by a sheep, a blah blah black sheep.
'''
# Creating the corpus
corpusdir = './mycorpus/' 
with (corpusdir+'text1.txt','w') as fout:
    fout.write(text1)
with (corpusdir+'text2.txt','w') as fout:
    fout.write(text2, fout)

# Read the the example corpus into NLTK's corpus class.
mycorpus = PlaintextCorpusReader(corpusdir, '.*')

# Read the NLTK's corpus into NLTK's text class, 
# where your book-like concoordance search is available
mytext = Text(mycorpus.words())

mytext.concoordance('foo')

NOTE: you can use other NLTK's CorpusReaders and even specify custom paragraph/sentence/word tokenizers and encoding but now, we'll stick to the default

like image 121
alvas Avatar answered Nov 02 '22 16:11

alvas


Text Analysis with NLTK Cheatsheet from bogs.princeton.edu https://blogs.princeton.edu/etc/files/2014/03/Text-Analysis-with-NLTK-Cheatsheet.pdf

Working with your own texts:

Open a file for reading

file = open('myfile.txt') 

Make sure you are in the correct directory before starting Python - or give the full path specification.

Read the file:

t = file.read() 

Tokenize the text:

tokens = nltk.word_tokenize(t)

Convert to NLTK Text object:

text = nltk.Text(tokens)
like image 44
C2Infinity Avatar answered Nov 02 '22 18:11

C2Infinity