Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get all noun phrases in Spacy

Tags:

python

nlp

spacy

I am new to Spacy and I would like to extract "all" the noun phrases from a sentence. I'm wondering how I can do it. I have the following code:

import spacy

nlp = spacy.load("en")

file = open("E:/test.txt", "r")
doc = nlp(file.read())
for np in doc.noun_chunks:
    print(np.text)

But it returns only the base noun phrases, that is, phrases which don't have any other NP in them. That is, for the following phrase, I get the result below:

Phrase: We try to explicitly describe the geometry of the edges of the images.

Result: We, the geometry, the edges, the images.

Expected result: We, the geometry, the edges, the images, the geometry of the edges of the images, the edges of the images.

How can I get all the noun phrases, including nested phrases?

like image 455
user1419243 Avatar asked Feb 22 '18 10:02

user1419243


People also ask

How do you get noun chunks in spaCy?

Noun chunks are “base noun phrases” – flat phrases that have a noun as their head. You can think of noun chunks as a noun plus the words describing the noun – for example, “the lavish green grass” or “the world's largest tech fund”. To get the noun chunks in a document, simply iterate over Doc.

How do you find the noun phrase in a sentence in Python?

noun_phrases() method. With the help of TextBlob. noun_phrases() method, we can get the noun phrases of the sentences by using TextBlob.

What does NLP () do in spaCy?

NLP helps you extract insights from unstructured text and has several use cases, such as: Automatic summarization. Named entity recognition. Question answering systems.

What is Noun_chunks?

Noun chunks is a core feature of Natural Language Processing. They are known as "noun phrases" in linguistics. Basicall they are nouns and all the words that depend on these nouns. For example, let's say you have the following sentence: John Doe has been working for the Microsoft company in Seattle since 1999.


1 Answers

Please see commented code below to recursively combine the nouns. Code inspired by the Spacy Docs here

import spacy

nlp = spacy.load("en")

doc = nlp("We try to explicitly describe the geometry of the edges of the images.")

for np in doc.noun_chunks: # use np instead of np.text
    print(np)

print()

# code to recursively combine nouns
# 'We' is actually a pronoun but included in your question
# hence the token.pos_ == "PRON" part in the last if statement
# suggest you extract PRON separately like the noun-chunks above

index = 0
nounIndices = []
for token in doc:
    # print(token.text, token.pos_, token.dep_, token.head.text)
    if token.pos_ == 'NOUN':
        nounIndices.append(index)
    index = index + 1


print(nounIndices)
for idxValue in nounIndices:
    doc = nlp("We try to explicitly describe the geometry of the edges of the images.")
    span = doc[doc[idxValue].left_edge.i : doc[idxValue].right_edge.i+1]
    span.merge()

    for token in doc:
        if token.dep_ == 'dobj' or token.dep_ == 'pobj' or token.pos_ == "PRON":
            print(token.text)
like image 150
Adnan S Avatar answered Oct 12 '22 19:10

Adnan S