I would like to (programmatically) detect the tense (and mood) of German sentences, preferably with SpaCy. I am able to find the root in the sentence and to determine whether it is a finite verb or not. However, Searching SpaCy's documentation I didn't find a solution to determine the tense. Is this possible with SpaCy, or do I need to create my own solution for this?
If it is possible with SpaCy, how?
If not, what would be a good approach to do this? My first approach would be to discriminate between Perfekt and Plusquamperfekt tense based on the existence of a participle verb form, and to identify Futur by checking if the root is a form of werden and the existence of a dependent infinite verb form, with some extra logic to check for Futur II, analogue to checking for Plusquamperfekt. For discrimination of Präteritum against Präsens I would think of doing a look-up in a verb table. Is that a good idea, or is there a better approach, maybe a prebuilt tool?
I have found this paper: Annotating tense, mood and voice for English, French and German, but they are not overly explicit how they do it; at least I am unable to reproduce their work.
SpaCy MorphAnalysis/Morphologizer gives you the result you want I guess. Just figured it out myself.
import spacy
nlp = spacy.load("de_core_news_lg")
sent = "Ich flog nach Rom."
doc = nlp(sent)
for token in doc:
print(token.text,list(token.morph), token.lemma_)
This might not be perfect because it returns a list like this:
Ich ['Case=Nom', 'Number=Sing', 'Person=1', 'PronType=Prs'] Ich
flog ['Mood=Ind', 'Number=Sing', 'Person=1', 'Tense=Past', 'VerbForm=Fin'] fliegen
nach [] nach
Rom ['Case=Dat', 'Gender=Neut', 'Number=Sing'] Rom
. [] .
But Ithink from here it is not too difficult to get a better representation like a dict or something.
Otherwise I would suggest to use the spacy function to_json()
.
See here:
nlp(sent1)
doc.to_json()
Which returns:
{'text': 'Ich flog nach Rom.',
'ents': [{'start': 14, 'end': 17, 'label': 'LOC'}],
'sents': [{'start': 0, 'end': 18}],
'tokens': [{'id': 0,
'start': 0,
'end': 3,
'tag': 'PPER',
'pos': 'PRON',
'morph': 'Case=Nom|Number=Sing|Person=1|PronType=Prs',
'lemma': 'Ich',
'dep': 'sb',
'head': 1},
{'id': 1,
'start': 4,
'end': 8,
'tag': 'VVFIN',
'pos': 'VERB',
'morph': 'Mood=Ind|Number=Sing|Person=1|Tense=Past|VerbForm=Fin',
'lemma': 'fliegen',
'dep': 'ROOT',
'head': 1},
{'id': 2,
'start': 9,
'end': 13,
'tag': 'APPR',
'pos': 'ADP',
'morph': '',
'lemma': 'nach',
'dep': 'mo',
'head': 1},
{'id': 3,
'start': 14,
'end': 17,
'tag': 'NE',
'pos': 'PROPN',
'morph': 'Case=Dat|Gender=Neut|Number=Sing',
'lemma': 'Rom',
'dep': 'nk',
'head': 2},
{'id': 4,
'start': 17,
'end': 18,
'tag': '$.',
'pos': 'PUNCT',
'morph': '',
'lemma': '.',
'dep': 'punct',
'head': 1}]}
Let me know if this is what you were searching for. :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With