Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spacy to extract specific noun phrase

Can I use spacy in python to find NP with specific neighbors? I want Noun phrases from my text that has verb before and after it.

like image 295
Vivek Khetan Avatar asked Jun 20 '17 19:06

Vivek Khetan


1 Answers

  1. You can merge the noun phrases ( so that they do not get tokenized seperately).
  2. Analyse the dependency parse tree, and see the POS of neighbouring tokens.

    >>> import spacy
    >>> nlp = spacy.load('en')
    >>> sent = u'run python program run, to make this work'
    >>> parsed = nlp(sent)
    >>> list(parsed.noun_chunks)
    [python program]
    >>> for noun_phrase in list(parsed.noun_chunks):
    ...     noun_phrase.merge(noun_phrase.root.tag_, noun_phrase.root.lemma_, noun_phrase.root.ent_type_)
    ... 
    python program
    >>> [(token.text,token.pos_) for token in parsed]
    [(u'run', u'VERB'), (u'python program', u'NOUN'), (u'run', u'VERB'), (u',', u'PUNCT'), (u'to', u'PART'), (u'make', u'VERB'), (u'this', u'DET'), (u'work', u'NOUN')]
    
  3. By analysing the POS of adjacent tokens, you can get your desired noun phrases.

  4. A better approach would be to analyse the dependency parse tree, and see the lefts and rights of the noun phrase, so that even if there is a punctuation or other POS tag between the noun phrase and verb, you can increase your search coverage
like image 114
DhruvPathak Avatar answered Sep 22 '22 17:09

DhruvPathak