Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spacy: How to get all words that describe a noun?

Tags:

python

nlp

spacy

I am new to spacy and to nlp overall.

To understand how spacy works I would like to create a function which takes a sentence and returns a dictionary,tuple or list with the noun and the words describing it.

I know that spacy creates a tree of the sentence and knows the use of each word (shown in displacy).

But what's the right way to get from:

"A large room with two yellow dishwashers in it"

To:

{noun:"room",adj:"large"} {noun:"dishwasher",adj:"yellow",adv:"two"}

Or any other solution that gives me all related words in a usable bundle.

Thanks in advance!

like image 319
mathi1651 Avatar asked Jun 03 '21 12:06

mathi1651


People also ask

What is noun chunks in spaCy?

Noun chunks are “base noun phrases” – flat phrases that have a noun as their head. You can think of noun chunks as a noun plus the words describing the noun – for example, “the lavish green grass” or “the world's largest tech fund”. To get the noun chunks in a document, simply iterate over Doc.

How do you check if a word is a noun in Python?

if (val = = 'NN' or val = = 'NNS' or val = = 'NNPS' or val = = 'NNP' ): print (text, " is a noun." ) else : print (text, " is not a noun." )

Which is better NLTK or spaCy?

While NLTK provides access to many algorithms to get something done, spaCy provides the best way to do it. It provides the fastest and most accurate syntactic analysis of any NLP library released to date. It also offers access to larger word vectors that are easier to customize.

What is Orth in spaCy?

orth is simply an integer that indicates the index of the occurrence of the word that is kept in the spacy.


2 Answers

This is a very straightforward use of the DependencyMatcher.

import spacy
from spacy.matcher import DependencyMatcher

nlp = spacy.load("en_core_web_sm")

pattern = [
  {
    "RIGHT_ID": "target",
    "RIGHT_ATTRS": {"POS": "NOUN"}
  },
  # founded -> subject
  {
    "LEFT_ID": "target",
    "REL_OP": ">",
    "RIGHT_ID": "modifier",
    "RIGHT_ATTRS": {"DEP": {"IN": ["amod", "nummod"]}}
  },
]

matcher = DependencyMatcher(nlp.vocab)
matcher.add("FOUNDED", [pattern])

text = "A large room with two yellow dishwashers in it"
doc = nlp(text)
for match_id, (target, modifier) in matcher(doc):
    print(doc[modifier], doc[target], sep="\t")

Output:

large   room
two dishwashers
yellow  dishwashers

It should be easy to turn that into a dictionary or whatever you'd like. You might also want to modify it to take proper nouns as the target, or to support other kinds of dependency relations, but this should be a good start.

You may also want to look at the noun chunks feature.

like image 156
polm23 Avatar answered Nov 11 '22 22:11

polm23


What you want to do is called "noun chunks":

import spacy
nlp = spacy.load('en_core_web_md')
txt = "A large room with two yellow dishwashers in it"
doc = nlp(txt)

chunks = []
for chunk in doc.noun_chunks:
    out = {}
    root = chunk.root
    out[root.pos_] = root
    for tok in chunk:
        if tok != root:
            out[tok.pos_] = tok
    chunks.append(out)
print(chunks)

[
 {'NOUN': room, 'DET': A, 'ADJ': large}, 
 {'NOUN': dishwashers, 'NUM': two, 'ADJ': yellow}, 
 {'PRON': it}
]

You may notice "noun chunk" doesn't guarantee the root will always be a noun. Should you wish to restrict your results to nouns only:

chunks = []
for chunk in doc.noun_chunks:
    out = {}
    noun = chunk.root
    if noun.pos_ != 'NOUN':
        continue
    out['noun'] = noun
    for tok in chunk:
        if tok != noun:
            out[tok.pos_] = tok
    chunks.append(out)
    
print(chunks)

[
 {'noun': room, 'DET': A, 'ADJ': large}, 
 {'noun': dishwashers, 'NUM': two, 'ADJ': yellow}
]
like image 22
Sergey Bushmanov Avatar answered Nov 11 '22 21:11

Sergey Bushmanov