I need to classify words into their parts of speech. Like a verb, a noun, an adverb etc.. I used the <pre class="prettyprint"><code>nltk.word_tokenize() #to identify word in a sentence nltk.pos_tag() #to identify the parts of speech nltk.ne_chunk() #to identify Named entities. </code></pre> The out put of this is a tree. Eg <pre class="prettyprint"><code>>>> sentence = "I am Jhon from America" >>> sent1 = nltk.word_tokenize(sentence ) >>> sent2 = nltk.pos_tag(sent1) >>> sent3 = nltk.ne_chunk(sent2, binary=True) >>> sent3 Tree('S', [('I', 'PRP'), ('am', 'VBP'), Tree('NE', [('Jhon', 'NNP')]), ('from', 'IN'), Tree('NE', [('America', 'NNP')])]) </code></pre> When accessing the element in this tree, i did it as follows: <pre class="prettyprint"><code>>>> sent3[0] ('I', 'PRP') >>> sent3[0][0] 'I' >>> sent3[0][1] 'PRP' </code></pre> But when accessing a Named Entity: <pre class="prettyprint"><code>>>> sent3[2] Tree('NE', [('Jhon', 'NNP')]) >>> sent3[2][0] ('Jhon', 'NNP') >>> sent3[2][1] Traceback (most recent call last): File "<pyshell#121>", line 1, in <module> sent3[2][1] File "C:\Python26\lib\site-packages\nltk\tree.py", line 139, in __getitem__ return list.__getitem__(self, index) IndexError: list index out of range </code></pre> I got the above error. What i want is to get the output as 'NE' similar to the previous 'PRP' so i cant identify which word is a Named Entity. Is there any way of doing this with NLTK in python?? If so please post the command. Or is there a function in the tree library to do this? I need the node value 'NE'

This answer may be off base, and in which case I'll delete it, as I don't have NLTK installed here to try it, but I think you can just do: <pre class="prettyprint"><code> >>> sent3[2].node 'NE' </code></pre> <code>sent3[2][0]</code> returns the first child of the tree, not the node itself Edit: I tried this when I got home, and it does indeed work.

Named Entity Recognition for NLTK in Python. Identifying the NE

Tags:

I need to classify words into their parts of speech. Like a verb, a noun, an adverb etc.. I used the

nltk.word_tokenize() #to identify word in a sentence 
nltk.pos_tag()       #to identify the parts of speech
nltk.ne_chunk()      #to identify Named entities.

The out put of this is a tree. Eg

>>> sentence = "I am Jhon from America"
>>> sent1 = nltk.word_tokenize(sentence )
>>> sent2 = nltk.pos_tag(sent1)
>>> sent3 =  nltk.ne_chunk(sent2, binary=True)
>>> sent3
Tree('S', [('I', 'PRP'), ('am', 'VBP'), Tree('NE', [('Jhon', 'NNP')]), ('from', 'IN'), Tree('NE', [('America', 'NNP')])])

When accessing the element in this tree, i did it as follows:

>>> sent3[0]
('I', 'PRP')
>>> sent3[0][0]
'I'
>>> sent3[0][1]
'PRP'

But when accessing a Named Entity:

>>> sent3[2]
Tree('NE', [('Jhon', 'NNP')])
>>> sent3[2][0]
('Jhon', 'NNP')
>>> sent3[2][1]    
Traceback (most recent call last):
  File "<pyshell#121>", line 1, in <module>
    sent3[2][1]
  File "C:\Python26\lib\site-packages\nltk\tree.py", line 139, in __getitem__
    return list.__getitem__(self, index)
IndexError: list index out of range

I got the above error.

What i want is to get the output as 'NE' similar to the previous 'PRP' so i cant identify which word is a Named Entity. Is there any way of doing this with NLTK in python?? If so please post the command. Or is there a function in the tree library to do this? I need the node value 'NE'

603

asked Apr 18 '11 20:04

Asl506

1 Answers

This answer may be off base, and in which case I'll delete it, as I don't have NLTK installed here to try it, but I think you can just do:

   >>> sent3[2].node
   'NE'

sent3[2][0] returns the first child of the tree, not the node itself

Edit: I tried this when I got home, and it does indeed work.

answered Sep 21 '22 06:09

bdk

Related questions
                            
                                How to get the wavelength of a pixel using RGB?
                            
                                .NET - Getting all implementations of a generic interface?
                            
                                Difference between <> and != in SQL
                            
                                Insert value into TEXTAREA where cursor was
                            
                                Defining custom URL routes in ASP.Net MVC
                            
                                How to add "irrelevant" edges
                            
                                How to avoid repeated code?
                            
                                Building Qt Creator projects from command line
                            
                                Django models: default value for column
                            
                                MySQL PRIMARY KEYs: UUID / GUID vs BIGINT (timestamp+random)
                            
                                Filter an array using a formula (without VBA)
                            
                                Entity Framework - Capitalizing first property name letter

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Named Entity Recognition for NLTK in Python. Identifying the NE

Tags:

Asl506

People also ask

1 Answers

bdk

Recent Activity

Donate For Us