Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Named Entity Recognition for NLTK in Python. Identifying the NE

Tags:

I need to classify words into their parts of speech. Like a verb, a noun, an adverb etc.. I used the

nltk.word_tokenize() #to identify word in a sentence 
nltk.pos_tag()       #to identify the parts of speech
nltk.ne_chunk()      #to identify Named entities. 

The out put of this is a tree. Eg

>>> sentence = "I am Jhon from America"
>>> sent1 = nltk.word_tokenize(sentence )
>>> sent2 = nltk.pos_tag(sent1)
>>> sent3 =  nltk.ne_chunk(sent2, binary=True)
>>> sent3
Tree('S', [('I', 'PRP'), ('am', 'VBP'), Tree('NE', [('Jhon', 'NNP')]), ('from', 'IN'), Tree('NE', [('America', 'NNP')])])

When accessing the element in this tree, i did it as follows:

>>> sent3[0]
('I', 'PRP')
>>> sent3[0][0]
'I'
>>> sent3[0][1]
'PRP'

But when accessing a Named Entity:

>>> sent3[2]
Tree('NE', [('Jhon', 'NNP')])
>>> sent3[2][0]
('Jhon', 'NNP')
>>> sent3[2][1]    
Traceback (most recent call last):
  File "<pyshell#121>", line 1, in <module>
    sent3[2][1]
  File "C:\Python26\lib\site-packages\nltk\tree.py", line 139, in __getitem__
    return list.__getitem__(self, index)
IndexError: list index out of range

I got the above error.

What i want is to get the output as 'NE' similar to the previous 'PRP' so i cant identify which word is a Named Entity. Is there any way of doing this with NLTK in python?? If so please post the command. Or is there a function in the tree library to do this? I need the node value 'NE'

like image 603
Asl506 Avatar asked Apr 18 '11 20:04

Asl506


People also ask

What are named entities NLTK?

Named entities are persons, locations, organizations, time expressions, etc. POS tagger does not look for the relation between the words in the document whereas NER looks for the relationship between words. The output of POS tagging is used as an input for NER.

What is named entity recognition in Python?

The named entity recognition (NER) is one of the most data preprocessing task. It involves the identification of key information in the text and classification into a set of predefined categories. An entity is basically the thing that is consistently talked about or refer to in the text.

How do you identify entities in NLP?

Generally, when we read a text, we recognize entities straightway like people, values, locations and more. For example, in the sentence “ Alexander the Great, was a king of the ancient Greek kingdom of Macedonia.”, we can identify three types of entities as follows: Person: Alexander.


1 Answers

This answer may be off base, and in which case I'll delete it, as I don't have NLTK installed here to try it, but I think you can just do:

   >>> sent3[2].node
   'NE'

sent3[2][0] returns the first child of the tree, not the node itself

Edit: I tried this when I got home, and it does indeed work.

like image 91
bdk Avatar answered Sep 21 '22 06:09

bdk