Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I get the noun clause that is the object of a certain verb?

Tags:

python

nlp

spacy

I am working with data from pharmaceutical labels. The text is always structured using the verb phrase 'indicated for'.

For example:

sentence = "Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis"

I have already used SpaCy to filter down to only sentences that contain the phrase 'indicated for'.

I now need a function that will take in the sentence, and return the phrase that is the object of 'indicated for'. So for this example, the function, which I have called extract(), would operate like this:

extract(sentence)
>> 'relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis'

Is there functionality to do this using spacy?

EDIT: Simply splitting after 'indicated for' won't work for complicated examples.

Here's some examples:

'''buprenorphine and naloxone sublingual tablets are indicated for the maintenance treatment of opioid dependence and should be used as part of a complete treatment plan to include counseling and psychosocial support buprenorphine and naloxone sublingual tablets contain buprenorphine a partial opioid agonist and naloxone an opioid antagonist and is indicated for the maintenance treatment of opioid dependence'''

'''ofloxacin ophthalmic solution is indicated for the treatment of infections caused by susceptible strains of the following bacteria in the conditions listed below conjunctivitis gram positive bacteria gram negative bacteria staphylococcus aureus staphylococcus epidermidis streptococcus pneumoniae enterobacter cloacae haemophilus influenzae proteus mirabilis pseudomonas aeruginosa corneal ulcers gram positive bacteria gram negative bacteria staphylococcus aureus staphylococcus epidermidis streptococcus pneumoniae pseudomonas aeruginosa serratia marcescens'''

where I just want the bold parts.

like image 225
max Avatar asked Mar 28 '18 19:03

max


2 Answers

# -*- coding: utf-8 -*-
#!/usr/bin/env python
from __future__ import unicode_literals
import spacy
nlp = spacy.load('en_core_web_sm')
text = 'Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis.'
doc = nlp(text)
for word in doc:
    if word.dep_ in ('pobj'):
        subtree_span = doc[word.left_edge.i : word.right_edge.i + 1]
        print(subtree_span.text)

Output:

relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis
the signs and symptoms of osteoarthritis and rheumatoid arthritis
osteoarthritis and rheumatoid arthritis

The reason for multiple output is due to multiple pobj.

Edit 2:

# -*- coding: utf-8 -*-
#!/usr/bin/env python
from __future__ import unicode_literals
import spacy
nlp = spacy.load('en_core_web_sm')
para = '''Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis.
Ofloxacin ophthalmic solution is indicated for the treatment of infections caused by susceptible strains of the following bacteria in the conditions listed below.'''
doc = nlp(para)

# To extract sentences based on key word
indicated_for_sents = [sent for sent in doc.sents if 'indicated for' in sent.string]
print indicated_for_sents
print
# To extract objects of verbs
for word in doc:
    if word.dep_ in ('pobj'):
        subtree_span = doc[word.left_edge.i : word.right_edge.i + 1]
        print(subtree_span.text)

output:

[Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis.
, Ofloxacin ophthalmic solution is indicated for the treatment of infections caused by susceptible strains of the following bacteria in the conditions listed below.]

relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis
the signs and symptoms of osteoarthritis and rheumatoid arthritis
osteoarthritis and rheumatoid arthritis


the treatment of infections caused by susceptible strains of the following bacteria in the conditions listed below
infections caused by susceptible strains of the following bacteria in the conditions listed below
susceptible strains of the following bacteria in the conditions listed below
the following bacteria in the conditions listed below
the conditions listed below

check this link

https://github.com/NSchrading/intro-spacy-nlp/blob/master/subject_object_extraction.py

like image 161
Programmer_nltk Avatar answered Oct 28 '22 10:10

Programmer_nltk


You need to use the dependency parsing feature of Spacy. The selected sentence containing ('indicated for') should be dependency parsed in Spacy to show the relationship between all the words. You can see a visualization of dependency parsing for the example sentence in your question with Spacy here.

After Spacy returns the dependency parse, you need to search for "indicated" token as a verb and find the children of the dependency tree. See example here. In your case, you will look to match "indicated" as verb and get the children instead of the 'xcomp' or 'ccomp' in the Github example.

like image 24
Adnan S Avatar answered Oct 28 '22 09:10

Adnan S