Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extending Lemma Lookup Table in Spacy

Tags:

I am currently processing texts with the NLP library Spacy. Spacy, however, does not lemmatize all words correctly, therefore I want to extend the lookup table. Currently I am merging Spacy's constant lookup table with my extension and subsequently overwrite Spacy's native lookup table.

I have the feeling, however, that this approach may not be the best and most consistent one.

Question: Is there another possibility to update the lookup table in Spacy, e.g. an update or extend function? I have read the Docs and could not find something like that. Or is this approach "just fine"?

Working example of my current approach:

import spacy
nlp = spacy.load('de')
Spacy_lookup = spacy.lang.de.LOOKUP
New_lookup = {'AAA':'Anonyme Affen Allianz','BBB':'Berliner Bauern Bund','CCC':'Chaos Chaoten Club'}
Spacy_lookup.update(New_lookup)
spacy.lang.de.LOOKUP = Spacy_lookup
tagged = nlp("Die AAA besiegt die BBB und den CCC unverdient.")
[ print(each.lemma_) for each in tagged]

Die
Anonyme Affen Allianz
besiegen
der
Berliner Bauern Bund
und
der
Chaos Chaoten Club
unverdient
.
like image 508
hou2zi0 Avatar asked Mar 22 '18 16:03

hou2zi0


1 Answers

Your solutions seems fine.

However, I cleaner workaround would be to take advantage of the custom spaCy pipeline feature. Specifically, you can create a new component that updates the lemma attribute if the token is in your doc and then stack it in your pipeline.

Example code:

import spacy
custom_lookup = {'AAA':'Anonyme Affen Allianz','BBB':'Berliner Bauern Bund','CCC':'Chaos Chaoten Club'}

def change_lemma_property(doc):
    for token in doc:
        if (token.text in custom_lookup):
            token.lemma_ = custom_lookup[token.text]
    return doc

nlp = spacy.load('de')
nlp.add_pipe(change_lemma_property, first=True)
text = 'Die AAA besiegt die BBB und den CCC unverdient.'
doc = nlp(text)
[print(x.lemma_) for x in doc]
like image 164
gdaras Avatar answered Sep 20 '22 13:09

gdaras