I need to classify words into their parts of speech. Like a verb, a noun, an adverb etc.. I used the
nltk.word_tokenize() #to identify word in a sentence
nltk.pos_tag() #to identify the parts of speech
nltk.ne_chunk() #to identify Named entities.
The out put of this is a tree. Eg
>>> sentence = "I am Jhon from America"
>>> sent1 = nltk.word_tokenize(sentence )
>>> sent2 = nltk.pos_tag(sent1)
>>> sent3 = nltk.ne_chunk(sent2, binary=True)
>>> sent3
Tree('S', [('I', 'PRP'), ('am', 'VBP'), Tree('NE', [('Jhon', 'NNP')]), ('from', 'IN'), Tree('NE', [('America', 'NNP')])])
When accessing the element in this tree, i did it as follows:
>>> sent3[0]
('I', 'PRP')
>>> sent3[0][0]
'I'
>>> sent3[0][1]
'PRP'
But when accessing a Named Entity:
>>> sent3[2]
Tree('NE', [('Jhon', 'NNP')])
>>> sent3[2][0]
('Jhon', 'NNP')
>>> sent3[2][1]
Traceback (most recent call last):
File "<pyshell#121>", line 1, in <module>
sent3[2][1]
File "C:\Python26\lib\site-packages\nltk\tree.py", line 139, in __getitem__
return list.__getitem__(self, index)
IndexError: list index out of range
I got the above error.
What i want is to get the output as 'NE' similar to the previous 'PRP' so i cant identify which word is a Named Entity. Is there any way of doing this with NLTK in python?? If so please post the command. Or is there a function in the tree library to do this? I need the node value 'NE'
Named entities are persons, locations, organizations, time expressions, etc. POS tagger does not look for the relation between the words in the document whereas NER looks for the relationship between words. The output of POS tagging is used as an input for NER.
The named entity recognition (NER) is one of the most data preprocessing task. It involves the identification of key information in the text and classification into a set of predefined categories. An entity is basically the thing that is consistently talked about or refer to in the text.
Generally, when we read a text, we recognize entities straightway like people, values, locations and more. For example, in the sentence “ Alexander the Great, was a king of the ancient Greek kingdom of Macedonia.”, we can identify three types of entities as follows: Person: Alexander.
This answer may be off base, and in which case I'll delete it, as I don't have NLTK installed here to try it, but I think you can just do:
>>> sent3[2].node
'NE'
sent3[2][0]
returns the first child of the tree, not the node itself
Edit: I tried this when I got home, and it does indeed work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With