I am trying to write a program that will take in two sentences and check if they are similar. I didn't want to use a full-fledged parser, and created one using a simple grammar that I think I would encounter most often. Now, my interest is in the noun phrases in the sentences. Checking for equality of the subtrees tagged as noun phrases would be easy enough. I want to add more to this, and let the user decide if missing/mismatched determiners is to be accepted(partial matches).
The output tree is of the form (S (NP The/DT bag/NN) is/VBZ (JP blue/JJ)), where I have defined the grammar noun phrases(NP) and adjective phrases(JP)
To go about matching, I've considered a few routes:
I'm new to python and am facing a few problems here:
if I write a recursive function to traverse the noun phrase tree till it reaches a leaf with a determiner, I am unable to modify the value in the original tree, as it's only passing the value.
the only delete function I found with respect to nltk trees is one that requires the exact index of the node to be deleted with respect to the root of the tree, in a format such as [0,0] if it's the leftmost child of the leftmost child of the root node. This is tricky to get as it would most likely involve a list of integers that grows with the height of the tree, for each node
I created a list of lists, where each list has all the leaves from one noun phrase excluding the determiners, and compared these.
So, my questions are,
How do I delete a node from an NLTK tree, without first obtaining it's index in the form [0,0,1,0,...]?
How do I modify a leaf value, again without using an index?(I would like to use a recursive function, and whenever the function hits a leaf I want to modify, I would like to modify it)
If these aren't possible, how can I obtain the index of a leaf? I'm stumped at this. Nltk trees have a treeposition function, but this only works for subtrees. Does Python consider the leaf to be a different type when compared to other nodes? Because treeposition isn't working for my leaves. This might be because my leafs are tuples and not just strings, but I don't know how to change that, because that's the pos tagger's output. So is there some way replace my leaf, which is a tuple of the form [the/DT] with a subtree of the form (DT the)? Defining recursive procedures again won't modify the original tree.
Any suggestions/observations?
Ok, let's tackle your questions one by one.
tree = Tree.parse("(S (NP The/DT bag/NN) is/VBZ (JP blue/JJ))")
Deleting a node:
tree.remove(Tree('JP', ['blue/JJ']))
tree.remove('is/VBZ')
Modifying a value. You could do this by getting the index of a member of the Tree (remember, it inherits list):
tree.index('is/VBZ')
but again, this is not a good approach.
The best way in traversing the leaves is getting the leaves with tree.leaves()
and then getting the indexes by tree.leaf_treeposition(index)
, and using these to modify/delete the leaf in-place.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With