I try to find the dependency path between two words in Python given dependency tree.
For sentence
Robots in popular culture are there to remind us of the awesomeness of unbound human agency.
I used practnlptools (https://github.com/biplab-iitb/practNLPTools) to get the dependency parsing result like:
nsubj(are-5, Robots-1)
xsubj(remind-8, Robots-1)
amod(culture-4, popular-3)
prep_in(Robots-1, culture-4)
root(ROOT-0, are-5)
advmod(are-5, there-6)
aux(remind-8, to-7)
xcomp(are-5, remind-8)
dobj(remind-8, us-9)
det(awesomeness-12, the-11)
prep_of(remind-8, awesomeness-12)
amod(agency-16, unbound-14)
amod(agency-16, human-15)
prep_of(awesomeness-12, agency-16)
which can also be visualized as (picture taken from https://demos.explosion.ai/displacy/)
The path length between "robots" and "are" is 1, the path length between "robots" and "awesomeness" would be 4.
My question is given above dependency parse result, how can I get dependency path or dependency path length between two words?
From my current search result, would nltk's ParentedTree help?
Thanks!
Your problem can easily be conceived as a graph problem where we have to find the shortest path between two nodes.
To convert your dependency parse in a graph, we first have to deal with the fact that it comes as a string. You want to get this:
'nsubj(are-5, Robots-1)\nxsubj(remind-8, Robots-1)\namod(culture-4, popular-3)\nprep_in(Robots-1, culture-4)\nroot(ROOT-0, are-5)\nadvmod(are-5, there-6)\naux(remind-8, to-7)\nxcomp(are-5, remind-8)\ndobj(remind-8, us-9)\ndet(awesomeness-12, the-11)\nprep_of(remind-8, awesomeness-12)\namod(agency-16, unbound-14)\namod(agency-16, human-15)\nprep_of(awesomeness-12, agency-16)'
to look like this:
[('are-5', 'Robots-1'), ('remind-8', 'Robots-1'), ('culture-4', 'popular-3'), ('Robots-1', 'culture-4'), ('ROOT-0', 'are-5'), ('are-5', 'there-6'), ('remind-8', 'to-7'), ('are-5', 'remind-8'), ('remind-8', 'us-9'), ('awesomeness-12', 'the-11'), ('remind-8', 'awesomeness-12'), ('agency-16', 'unbound-14'), ('agency-16', 'human-15'), ('awesomeness-12', 'agency-16')]
This way you can feed the tuple list to a graph constructor from the networkx module that will analyze the list and build a graph for you, plus give you a neat method that gives you the length of the shortest path between two given nodes.
Necessary imports
import re
import networkx as nx
from practnlptools.tools import Annotator
How to get your string in the desired tuple list format
annotator = Annotator()
text = """Robots in popular culture are there to remind us of the awesomeness of unbound human agency."""
dep_parse = annotator.getAnnotations(text, dep_parse=True)['dep_parse']
dp_list = dep_parse.split('\n')
pattern = re.compile(r'.+?\((.+?), (.+?)\)')
edges = []
for dep in dp_list:
m = pattern.search(dep)
edges.append((m.group(1), m.group(2)))
How to build the graph
graph = nx.Graph(edges) # Well that was easy
How to compute shortest path length
print(nx.shortest_path_length(graph, source='Robots-1', target='awesomeness-12'))
This script will reveal that the shortest path given the dependency parse is actually of length 2, since you can get from Robots-1
to awesomeness-12
by going through remind-8
1. xsubj(remind-8, Robots-1)
2. prep_of(remind-8, awesomeness-12)
If you don't like this result, you might want to think about filtering some dependencies, in this case not allow the xsubj
dependency to be added to the graph.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With