Ho to do lemmatization on German text?

Question

I have a German text that I want to apply lemmatization to. If lemmatization is not possible, then I can live with stemming too.

Data: This is my German text:

mails=['Hallo. Ich spielte am frühen Morgen und ging dann zu einem Freund. Auf Wiedersehen', 'Guten Tag Ich mochte Bälle und will etwas kaufen. Tschüss']

Goal: After applying lemmatization it should look similar to this:

mails_lemma=['Hallo. Ich spielen am früh Morgen und gehen dann zu einer Freund. Auf Wiedersehen', 'Guten Tag Ich mögen Ball und wollen etwas kaufen Tschüss']

I tried using spacy

conda install -c conda-forge spacy

python -m spacy download de_core_news_md

import spacy
from spacy.lemmatizer import Lemmatizer
lemmatizer = Lemmatizer()
[lemmatizer.lookup(word) for word in mails]

I see following problems.

My data is structured in sentences and not single words
In my case spacy lemmatization doesn't seem to work even for single words.

Can you please tell me how this works?

cronoik · Accepted Answer

Just wrap it into a loop and get the lemma of each token:

import spacy
nlp = spacy.load('de_core_news_md')

mails=['Hallo. Ich spielte am frühen Morgen und ging dann zu einem Freund. Auf Wiedersehen', 'Guten Tag Ich mochte Bälle und will etwas kaufen. Tschüss']

mails_lemma = []

for mail in mails:
     doc = nlp(mail)
     result = ' '.join([x.lemma_ for x in doc]) 
     mails_lemma.append(result)

Output:

['hallo . ich spielen am früh Morgen und gehen dann zu einer Freund . Auf Wiedersehen ',
 'Guten tagen ich mögen Ball und wollen etwas kaufen . Tschüss']

Ho to do lemmatization on German text?

Tags:

nlp

lemmatization

spacy

PParker

1 Answers

cronoik

Recent Activity

Donate For Us

Ho to do lemmatization on German text?

Tags:

nlp

lemmatization

spacy

PParker

1 Answers

cronoik

Related questions

Recent Activity

Donate For Us