I would like to extract "all" the noun phrases from a sentence. I'm wondering how I can do it. I have the following code:
doc2 = nlp("what is the capital of Bangladesh?")
for chunk in doc2.noun_chunks:
print(chunk)
1. what
2. the capital
3. bangladesh
the capital of Bangladesh
I have tried answers from spacy doc and StackOverflow. Nothing worked. It seems only cTakes and Stanford core NLP can give such complex NP.
Any help is appreciated.
For those who are still looking for this answer
noun_pharses=set()
for nc in doc.noun_chunks:
for np in [nc, doc[nc.root.left_edge.i:nc.root.right_edge.i+1]]:
noun_pharses.add(np)
This is how I get all the complex noun phrase
Spacy clearly defines a noun chunk as:
A base noun phrase, or "NP chunk", is a noun phrase that does not permit other NPs to be nested within it – so no NP-level coordination, no prepositional phrases, and no relative clauses." (https://spacy.io/api/doc#noun_chunks)
If you process the dependency parse differently, allowing prepositional modifiers and nested phrases/chunks, then you can end up with what you're looking for.
I bet you could modify the existing spacy code fairly easily to do what you want:
https://github.com/explosion/spaCy/blob/06c6dc6fbcb8fbb78a61a2e42c1b782974bd43bd/spacy/lang/en/syntax_iterators.py
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With