I'm trying to parse verbs in a corpus and list them in dictionaries and count how many times each verb appears as a transitive, intransitive and ditransitive. I was wondering how I could use spacy to parse through the verbs and notate them as transitive, intransitive and ditransitive.
Here, I summarize the code from Mirith/Verb-categorizer
. Basically, you can loop through VERB
token and look at their children to classify them as transitive, intransitive or ditransitive. An example is as follows.
First, import spacy
,
import spacy
nlp = spacy.load('en')
Suppose you have an example of tokens,
tokens = nlp('I like this dog. It is pretty good. I saw a bird. We arrived at the classroom door with only seven seconds to spare.')
You can create following function to transform VERB
into new type as you want:
def check_verb(token):
"""Check verb type given spacy token"""
if token.pos_ == 'VERB':
indirect_object = False
direct_object = False
for item in token.children:
if(item.dep_ == "iobj" or item.dep_ == "pobj"):
indirect_object = True
if (item.dep_ == "dobj" or item.dep_ == "dative"):
direct_object = True
if indirect_object and direct_object:
return 'DITRANVERB'
elif direct_object and not indirect_object:
return 'TRANVERB'
elif not direct_object and not indirect_object:
return 'INTRANVERB'
else:
return 'VERB'
else:
return token.pos_
Example
[check_verb(t) for t in tokens] # ['PRON', 'TRAN', 'DET', 'NOUN', 'PUNCT', ...]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With