I am new to spacy and to nlp overall. To understand how spacy works I would like to create a function which takes a sentence and returns a dictionary,tuple or list with the noun and the words describing it. I know that spacy creates a tree of the sentence and knows the use of each word (shown in displacy). But what's the right way to get from: <blockquote> "A large room with two yellow dishwashers in it" </blockquote> To: <blockquote> {noun:"room",adj:"large"} {noun:"dishwasher",adj:"yellow",adv:"two"} </blockquote> Or any other solution that gives me all related words in a usable bundle. Thanks in advance!

What you want to do is called "noun chunks": <pre class="prettyprint"><code>import spacy nlp = spacy.load('en_core_web_md') txt = "A large room with two yellow dishwashers in it" doc = nlp(txt) chunks = [] for chunk in doc.noun_chunks: out = {} root = chunk.root out[root.pos_] = root for tok in chunk: if tok != root: out[tok.pos_] = tok chunks.append(out) print(chunks) </code></pre> <hr> <pre class="prettyprint"><code>[ {'NOUN': room, 'DET': A, 'ADJ': large}, {'NOUN': dishwashers, 'NUM': two, 'ADJ': yellow}, {'PRON': it} ] </code></pre> You may notice "noun chunk" doesn't guarantee the root will always be a noun. Should you wish to restrict your results to nouns only: <pre class="prettyprint"><code>chunks = [] for chunk in doc.noun_chunks: out = {} noun = chunk.root if noun.pos_ != 'NOUN': continue out['noun'] = noun for tok in chunk: if tok != noun: out[tok.pos_] = tok chunks.append(out) print(chunks) </code></pre> <hr> <pre class="prettyprint"><code>[ {'noun': room, 'DET': A, 'ADJ': large}, {'noun': dishwashers, 'NUM': two, 'ADJ': yellow} ] </code></pre>

Spacy: How to get all words that describe a noun?

2 Answers

This is a very straightforward use of the DependencyMatcher.

import spacy
from spacy.matcher import DependencyMatcher

nlp = spacy.load("en_core_web_sm")

pattern = [
  {
    "RIGHT_ID": "target",
    "RIGHT_ATTRS": {"POS": "NOUN"}
  },
  # founded -> subject
  {
    "LEFT_ID": "target",
    "REL_OP": ">",
    "RIGHT_ID": "modifier",
    "RIGHT_ATTRS": {"DEP": {"IN": ["amod", "nummod"]}}
  },
]

matcher = DependencyMatcher(nlp.vocab)
matcher.add("FOUNDED", [pattern])

text = "A large room with two yellow dishwashers in it"
doc = nlp(text)
for match_id, (target, modifier) in matcher(doc):
    print(doc[modifier], doc[target], sep="\t")

Output:

large   room
two dishwashers
yellow  dishwashers

It should be easy to turn that into a dictionary or whatever you'd like. You might also want to modify it to take proper nouns as the target, or to support other kinds of dependency relations, but this should be a good start.

You may also want to look at the noun chunks feature.

156

answered Nov 11 '22 22:11

polm23

What you want to do is called "noun chunks":

import spacy
nlp = spacy.load('en_core_web_md')
txt = "A large room with two yellow dishwashers in it"
doc = nlp(txt)

chunks = []
for chunk in doc.noun_chunks:
    out = {}
    root = chunk.root
    out[root.pos_] = root
    for tok in chunk:
        if tok != root:
            out[tok.pos_] = tok
    chunks.append(out)
print(chunks)

[
 {'NOUN': room, 'DET': A, 'ADJ': large}, 
 {'NOUN': dishwashers, 'NUM': two, 'ADJ': yellow}, 
 {'PRON': it}
]

You may notice "noun chunk" doesn't guarantee the root will always be a noun. Should you wish to restrict your results to nouns only:

chunks = []
for chunk in doc.noun_chunks:
    out = {}
    noun = chunk.root
    if noun.pos_ != 'NOUN':
        continue
    out['noun'] = noun
    for tok in chunk:
        if tok != noun:
            out[tok.pos_] = tok
    chunks.append(out)
    
print(chunks)

[
 {'noun': room, 'DET': A, 'ADJ': large}, 
 {'noun': dishwashers, 'NUM': two, 'ADJ': yellow}
]

answered Nov 11 '22 21:11

Sergey Bushmanov

Related questions
                            
                                AttributeError: 'NoneType' object has no attribute 'excluded_of'
                            
                                trying to find the current project id of the deployed python function in google cloud gives error
                            
                                How do I turn off the "Evaluating: plt.show() did not finish after 3.00s seconds." warning in the VsCode debugger?
                            
                                How to view opts for Holoviews with Bokeh in Python
                            
                                How to handle job cancelation in Slurm?
                            
                                How to find the range of dates from a datetime column in a dataframe?
                            
                                How can I combine two dataframes based on a column of lists in Pandas
                            
                                Close position Binance Futures with ccxt
                            
                                Sum negative row values with previous rows pandas
                            
                                Can I override fields from a Pydantic parent model to make them optional?
                            
                                Read .pptx file from s3
                            
                                Matplotlib figure '.supxlabel' does not work
                            
                                Unable to access the updated global variable's value
                            
                                How to get the pivot lines from two tab-separated files?
                            
                                Update XML with an SQL query
                            
                                Most efficient way to find neighbors of neighbors in python
                            
                                turn "string-like" list into int with python [duplicate]
                            
                                AttributeError: module 'keras.utils.generic_utils' has no attribute 'populate_dict_with_module_objects'
                            
                                Run only tests which depend on the change
                            
                                Pandas - Explode multiple columns in pandas and assign value based on the exploded column

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spacy: How to get all words that describe a noun?

Tags:

python

nlp

spacy

mathi1651

People also ask

2 Answers

polm23

Sergey Bushmanov

Recent Activity

Donate For Us