So as a bit of a thought experiment I coded up a function in python that uses spaCy to find the subject of a news article, then replace it with a noun of choice. The problem is, it doesn't exactly work well, and I was hoping it could be improved. I don't exactly understand spaCy that well, and the documentation is a bit hard to understand.
First, the code:
doc=nlp(thetitle)
for text in doc:
#subject would be
if text.dep_ == "nsubj":
subject = text.orth_
#iobj for indirect object
if text.dep_ == "iobj":
indirect_object = text.orth_
#dobj for direct object
if text.dep_ == "dobj":
direct_object = text.orth_
try:
subject
except NameError:
if not thetitle: #if empty title
thetitle = "cat"
subject = "cat"
else: #if unknown subject
try: #do we have a direct object?
direct_object
except NameError:
try: #do we have an indirect object?
indirect_object
except NameError: #still no??
subject = random.choice(thetitle.split())
else:
subject = indirect_object
else:
subject = direct_object
else:
thecat = "cat" #do nothing here, everything went okay
newtitle = re.sub(r"\b%s\b" % subject, toreplace, thetitle)
if (newtitle == thetitle) : #if no replacement happened due to regex
newtitle = thetitle.replace(subject, toreplace)
return newtitle
the "cat" lines are filler lines that don't do anything. "thetitle" is a variable for a random news article title I'm pulling in from RSS feeds. "toreplace" is the variable that holds the string to replace whatever the found subject is.
Let's use an example:
"Video Games that Should Be Animated TV Shows - Screen Rant" And here's the displaCy breakdown of that: https://demos.explosion.ai/displacy/?text=Video%20Games%20that%20Should%20Be%20Animated%20TV%20Shows%20-%20Screen%20Rant&model=en&cpu=1&cph=1
The word the code decided to replace ended up being "that", which isn't even a noun in this sentence, but seems to have resulted in the random word choice fallback, since it couldn't find a subject, indirect object, or direct object. My hope is that it would find something more like "Video games" in this example.
I should note if I take the last bit out (which appears to be the source for the news article) in displaCy: https://demos.explosion.ai/displacy/?text=Video%20Games%20that%20Should%20Be%20Animated%20TV%20Shows&model=en&cpu=1&cph=1 it seems to think "that" is the subject, which is incorrect.
What is a better way to parse this? Should I look for proper nouns first?
Not directly answering your question, I think the code below is far more readable because the conditions are explicit, and what happens when a condition is valid is not buried in an else
clause far away. This code also takes care of the cases with multiple objects.
To your problem: any natural language processing tool will have a hard time to find the subject (or maybe rather topic) of a sentence fragment, they are trained with complete sentences. I'm not even sure if such fragments technically have subjects (I'm not an expert, though). You could try to train your own model, but then you will have to provide labeled sentences, I don't know if such a thing already exists for sentence fragments.
I am not fully sure what you want to achieve, looking at the common nouns and pronouns might likely contain the word you want to replace, and the first one appearing is likely the most important.
import spacy
import random
import re
from collections import defaultdict
def replace_subj(sentence, nlp):
doc = nlp(sentence)
tokens = defaultdict(list)
for text in doc:
tokens[text.dep_].append(text.orth_)
if not sentence:
return "cat"
if "nsubj" in tokens:
subject = tokens["nsubj"][0]
elif "dobj" in tokens:
subject = tokens["dobj"][0]
elif "iobj" in tokens:
subject = tokens["iobj"][0]
else:
subject = random.choice(sentence.split())
return re.sub(r"\b{}\b".format(subject), "cat", sentence)
if __name__ == "__main__":
sentence = """Video Games that Should Be Animated TV Shows - Screen Rant"""
nlp = spacy.load("en")
print(replace_subj(sentence, nlp))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With