Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

difference between similar() and concordance in nltk

I have read the text1.similar("monstrous") and text1.concordance("monstrous") from this.

Where I couldn't get the satisfactory answer for the difference between text1.concordance('monstrous') and text1.similar('monstrous') of natural language processing toolkit in python.

So would you please give the explanation with an example in detail?

like image 317
dex Avatar asked Apr 16 '17 14:04

dex


2 Answers

Using concordance(token) gives you the context surrounding the argument token. It will show you the sentences where token appears.

Using similar(token) returns a list of words that appear in the same context as token. In this case the the context is just the words directly on either side of token.

So, looking at the Moby Dick text (text1). We can check the concordance of 'monstrous'

text1.concordance('monstrous')

# returns:
Displaying 11 of 11 matches:
ong the former , one was of a most monstrous size . ... This came towards us ,
ON OF THE PSALMS . " Touching that monstrous bulk of the whale or ork we have r
ll over with a heathenish array of monstrous clubs and spears . Some were thick
d as you gazed , and wondered what monstrous cannibal and savage could ever hav
that has survived the flood ; most monstrous and most mountainous ! That Himmal
they might scout at Moby Dick as a monstrous fable , or still worse and more de
th of Radney .'" CHAPTER 55 Of the Monstrous Pictures of Whales . I shall ere l
ing Scenes . In connexion with the monstrous pictures of whales , I am strongly
ere to enter upon those still more monstrous stories of them which are to be fo
ght have been rummaged out of this monstrous cabinet there is no telling . But
of Whale - Bones ; for Whales of a monstrous size are oftentimes cast up dead u

And then we can get a list of words that appear in similar contexts to 'monstrous'. The context for the first returned line is 'most _____ size'.

text1.similar('monstrous')

# returns:
determined maddens contemptible modifies abundant tyrannical puzzled
trustworthy impalpable gamesome curious mean pitiable untoward
christian subtly passing domineering uncommon true

If we take the word 'true' and check it's concordance with text.concordance('true') we will get back the first 25 of 87 uses of the word 'true'. This isn't terribly useful, but NLTK does provide an additional method called common_contexts that shows when the use of a list of words share the same surrounding words.

text1.common_contexts(['monstrous', 'true'])

# returns:
the_pictures

This result tells us that the phrases "the monstrous pictures" and "the true pictures" both appear in Moby Dick.

like image 185
James Avatar answered Sep 19 '22 09:09

James


I will explain with example:

text1.similar("monstrous")

will output the words with similar context such as word1 ______ word2. For example it outputs the word doleful. If you run:

text1.concordance("monstrous")

You will see among the matches the line:

that has survived the flood ; most monstrous and most mountainous ! That Himmal

If you run:

text1.concordance("doleful")

You will see among the matches the line:

ite perspectives . There ' s a most doleful and most mocking funeral ! The sea

And

text1.common_contexts(["monstrous", "doleful"])

will output common surrounding words of monstrous and doleful which are "most" and "and"

most_and

like image 42
torayeff Avatar answered Sep 20 '22 09:09

torayeff