How to get all noun phrases in Spacy

Tags:

I am new to Spacy and I would like to extract "all" the noun phrases from a sentence. I'm wondering how I can do it. I have the following code:

import spacy

nlp = spacy.load("en")

file = open("E:/test.txt", "r")
doc = nlp(file.read())
for np in doc.noun_chunks:
    print(np.text)

But it returns only the base noun phrases, that is, phrases which don't have any other NP in them. That is, for the following phrase, I get the result below:

Phrase: We try to explicitly describe the geometry of the edges of the images.

Result: We, the geometry, the edges, the images.

Expected result: We, the geometry, the edges, the images, the geometry of the edges of the images, the edges of the images.

How can I get all the noun phrases, including nested phrases?

455

asked Feb 22 '18 10:02

user1419243

1 Answers

Please see commented code below to recursively combine the nouns. Code inspired by the Spacy Docs here

import spacy

nlp = spacy.load("en")

doc = nlp("We try to explicitly describe the geometry of the edges of the images.")

for np in doc.noun_chunks: # use np instead of np.text
    print(np)

print()

# code to recursively combine nouns
# 'We' is actually a pronoun but included in your question
# hence the token.pos_ == "PRON" part in the last if statement
# suggest you extract PRON separately like the noun-chunks above

index = 0
nounIndices = []
for token in doc:
    # print(token.text, token.pos_, token.dep_, token.head.text)
    if token.pos_ == 'NOUN':
        nounIndices.append(index)
    index = index + 1


print(nounIndices)
for idxValue in nounIndices:
    doc = nlp("We try to explicitly describe the geometry of the edges of the images.")
    span = doc[doc[idxValue].left_edge.i : doc[idxValue].right_edge.i+1]
    span.merge()

    for token in doc:
        if token.dep_ == 'dobj' or token.dep_ == 'pobj' or token.pos_ == "PRON":
            print(token.text)

150

answered Oct 12 '22 19:10

Adnan S

Related questions
                            
                                What do the functions tf.squeeze and tf.nn.rnn do?
                            
                                Environment specific pip.conf under anaconda
                            
                                Hiding and showing a widget in Kivy
                            
                                How do I have a "press enter to continue" feature in python? [duplicate]
                            
                                sqlalchemy print results instead of objects
                            
                                pip install mod_wsgi, How to Set MOD_WSGI_APACHE_ROOTDIR environment?
                            
                                ImportError: No module named googleapiclient.discovery
                            
                                How does paging work in the list_blobs function in Google Cloud Storage Python Client Library
                            
                                Is LASSO regression implemented in Statsmodels?
                            
                                Import CSV to database using sqlalchemy
                            
                                In method call args, how to override keyword argument of unpacked dict?
                            
                                mypy: how to define a generic subclass
                            
                                LSTM: Understand timesteps, samples and features and especially the use in reshape and input_shape
                            
                                Set values based on df.query?
                            
                                What is the necessity of sys.exit(app.exec_()) in PyQt?
                            
                                Bin elements per row - Vectorized 2D Bincount for NumPy
                            
                                Real-time audio signal processing using python
                            
                                sklearn kfold returning wrong indexes in python
                            
                                Why is a compiled python regex slower?
                            
                                pandas multiply using dictionary values across several columns

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get all noun phrases in Spacy

Tags:

python

nlp

spacy

user1419243

People also ask

1 Answers

Adnan S

Recent Activity

Donate For Us