How can I extract noun phrases from text using spacy? I am not referring to part of speech tags. In the documentation I cannot find anything about noun phrases or regular parse trees.

If you want base NPs, i.e. NPs without coordination, prepositional phrases or relative clauses, you can use the noun_chunks iterator on the Doc and Span objects: <pre class="prettyprint"><code>>>> from spacy.en import English >>> nlp = English() >>> doc = nlp(u'The cat and the dog sleep in the basket near the door.') >>> for np in doc.noun_chunks: >>> np.text u'The cat' u'the dog' u'the basket' u'the door' </code></pre> If you need something else, the best way is to iterate over the words of the sentence and consider the syntactic context to determine whether the word governs the phrase-type you want. If it does, yield its subtree: <pre class="prettyprint"><code>from spacy.symbols import * np_labels = set([nsubj, nsubjpass, dobj, iobj, pobj]) # Probably others too def iter_nps(doc): for word in doc: if word.dep in np_labels: yield word.subtree </code></pre>

<pre class="prettyprint"><code>import spacy nlp = spacy.load("en_core_web_sm") doc =nlp('Bananas are an excellent source of potassium.') for np in doc.noun_chunks: print(np.text) ''' Bananas an excellent source potassium ''' for word in doc: print('word.dep:', word.dep, ' | ', 'word.dep_:', word.dep_) ''' word.dep: 429 | word.dep_: nsubj word.dep: 8206900633647566924 | word.dep_: ROOT word.dep: 415 | word.dep_: det word.dep: 402 | word.dep_: amod word.dep: 404 | word.dep_: attr word.dep: 443 | word.dep_: prep word.dep: 439 | word.dep_: pobj word.dep: 445 | word.dep_: punct ''' from spacy.symbols import * np_labels = set([nsubj, nsubjpass, dobj, iobj, pobj]) print('np_labels:', np_labels) ''' np_labels: {416, 422, 429, 430, 439} ''' </code></pre> https://www.geeksforgeeks.org/use-yield-keyword-instead-return-keyword-python/ <pre class="prettyprint"><code>def iter_nps(doc): for word in doc: if word.dep in np_labels: yield(word.dep_) iter_nps(doc) ''' <generator object iter_nps at 0x7fd7b08b5bd0> ''' ## Modified method: def iter_nps(doc): for word in doc: if word.dep in np_labels: print(word.text, word.dep_) iter_nps(doc) ''' Bananas nsubj potassium pobj ''' doc = nlp('BRCA1 is a tumor suppressor protein that functions to maintain genomic stability.') for np in doc.noun_chunks: print(np.text) ''' BRCA1 a tumor suppressor protein genomic stability ''' iter_nps(doc) ''' BRCA1 nsubj that nsubj stability dobj ''' </code></pre>

If you want to specify more exactly which kind of noun phrase you want to extract, you can use textacy's <code>matches</code> function. You can pass any combination of POS tags. For example, <pre class="prettyprint"><code>textacy.extract.matches(doc, "POS:ADP POS:DET:? POS:ADJ:? POS:NOUN:+") </code></pre> will return any nouns that are preceded by a preposition and optionally by a determiner and/or adjective. Textacy was built on spacy, so they should work perfectly together.

Noun phrases with spacy

4 Answers

If you want base NPs, i.e. NPs without coordination, prepositional phrases or relative clauses, you can use the noun_chunks iterator on the Doc and Span objects:

Click to copy

>>> from spacy.en import English
>>> nlp = English()
>>> doc = nlp(u'The cat and the dog sleep in the basket near the door.')
>>> for np in doc.noun_chunks:
>>>     np.text
u'The cat'
u'the dog'
u'the basket'
u'the door'

If you need something else, the best way is to iterate over the words of the sentence and consider the syntactic context to determine whether the word governs the phrase-type you want. If it does, yield its subtree:

Click to copy

from spacy.symbols import *

np_labels = set([nsubj, nsubjpass, dobj, iobj, pobj]) # Probably others too
def iter_nps(doc):
    for word in doc:
        if word.dep in np_labels:
            yield word.subtree

157

answered Oct 04 '22 01:10

syllogism_

Click to copy

import spacy
nlp = spacy.load("en_core_web_sm")
doc =nlp('Bananas are an excellent source of potassium.')
for np in doc.noun_chunks:
    print(np.text)
'''
  Bananas
  an excellent source
  potassium
'''

for word in doc:
    print('word.dep:', word.dep, ' | ', 'word.dep_:', word.dep_)
'''
  word.dep: 429  |  word.dep_: nsubj
  word.dep: 8206900633647566924  |  word.dep_: ROOT
  word.dep: 415  |  word.dep_: det
  word.dep: 402  |  word.dep_: amod
  word.dep: 404  |  word.dep_: attr
  word.dep: 443  |  word.dep_: prep
  word.dep: 439  |  word.dep_: pobj
  word.dep: 445  |  word.dep_: punct
'''

from spacy.symbols import *
np_labels = set([nsubj, nsubjpass, dobj, iobj, pobj])
print('np_labels:', np_labels)
'''
  np_labels: {416, 422, 429, 430, 439}
'''

https://www.geeksforgeeks.org/use-yield-keyword-instead-return-keyword-python/

Click to copy

def iter_nps(doc):
    for word in doc:
        if word.dep in np_labels:
            yield(word.dep_)

iter_nps(doc)
'''
  <generator object iter_nps at 0x7fd7b08b5bd0>
'''

## Modified method:
def iter_nps(doc):
    for word in doc:
        if word.dep in np_labels:
            print(word.text, word.dep_)

iter_nps(doc)
'''
  Bananas nsubj
  potassium pobj
'''

doc = nlp('BRCA1 is a tumor suppressor protein that functions to maintain genomic stability.')
for np in doc.noun_chunks:
    print(np.text)
'''
  BRCA1
  a tumor suppressor protein
  genomic stability
'''

iter_nps(doc)
'''
  BRCA1 nsubj
  that nsubj
  stability dobj
'''

answered Oct 04 '22 03:10

Victoria Stuart

You can also get noun from a sentence like this:

Click to copy

    import spacy
    nlp=spacy.load("en_core_web_sm")
    doc=nlp("When Sebastian Thrun started working on self-driving cars at "
    "Google in 2007, few people outside of the company took him "
    "seriously. “I can tell you very senior CEOs of major American "
    "car companies would shake my hand and turn away because I wasn’t "
    "worth talking to,” said Thrun, in an interview with Recode earlier "
    "this week.")
    #doc text is from spacy website
    for x in doc :
    if x.pos_ == "NOUN" or x.pos_ == "PROPN" or x.pos_=="PRON":
    print(x)
    # here you can get Nouns, Proper Nouns and Pronouns

answered Oct 04 '22 02:10

Talha Tayyab

If you want to specify more exactly which kind of noun phrase you want to extract, you can use textacy's matches function. You can pass any combination of POS tags. For example,

Click to copy

textacy.extract.matches(doc, "POS:ADP POS:DET:? POS:ADJ:? POS:NOUN:+")

will return any nouns that are preceded by a preposition and optionally by a determiner and/or adjective.

Textacy was built on spacy, so they should work perfectly together.

answered Oct 04 '22 01:10

Suzana

Related questions
                            
                                Django: "TypeError: [] is not JSON serializable" Why?
                            
                                Reading binary data from stdin
                            
                                Python reverse-stride slicing
                            
                                How to check if the current time is in range in python?
                            
                                How to write python lambda with multiple lines? [duplicate]
                            
                                ImportError: No module named flask.ext.login
                            
                                drop_all() freezes in Flask with SQLAlchemy
                            
                                Pandas Select DataFrame columns using boolean
                            
                                Proxy awareness with pip
                            
                                Flask-Session extension vs default session
                            
                                Python & Pandas - Group by day and count for each day
                            
                                Matplotlib custom marker/symbol
                            
                                How do I get the return value when using Python exec on the code object of a function?
                            
                                Delete a key and value from an OrderedDict
                            
                                Python3 - Is there a way to iterate row by row over a very large SQlite table without loading the entire table into local memory?
                            
                                How to open an .npz file
                            
                                "This constructor takes no arguments" error in __init__
                            
                                How do I change the background of a Frame in Tkinter?
                            
                                How to freeze entire header row in openpyxl?
                            
                                Best way to access the Nth line of csv file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Noun phrases with spacy

Tags:

python

spacy

CentAu

People also ask

4 Answers

syllogism_

Victoria Stuart

Talha Tayyab

Suzana

Recent Activity

Donate For Us