Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use Spacy to find Lemma of Russian (Those langs which don't have model)

Tags:

nlp

spacy

I have downloaded Spacy English model and finding lemma using this code.

import spacy
nlp = spacy.load('en')
doc = nlp(u'Two apples')
for token in doc:
    print(token, token.lemma, token.lemma_)

Output:

Two 11711838292424000352 two
apples 8566208034543834098 apple

Now I wanted to do same thing for Russian language. But Spacy don't have models for Russian language. But I am seeing their GitHub code for Russian language and I think that code could be used to find lemma.

I am new to Spacy. Will needed a starting point for those languages which don't have models. Also I have noted that for some languages let say for URDU they have provided a look up dictionary for lemmatization.

I want to expand this thing to all those languages which don't have models.

Note: In above code I believe that it could be further improved as in my case I needed lemma only so what are the things which I can turn off and how?

like image 853
Hammad Hassan Avatar asked Feb 04 '19 08:02

Hammad Hassan


2 Answers

  • This won't be optimal in spacy if a model is not present.
  • StanfordNLP has more language coverage, and has Russian language models : https://stanfordnlp.github.io/stanfordnlp/installation_download.html

enter image description here - spaCy recently launched a handy wrapper over Stanford NLP, so you can use StanfordNLP goodies seamlessly within spaCy pipelines:

https://github.com/explosion/spacy-stanfordnlp

The code would look something like this ( not tested ) :

import stanfordnlp
from spacy_stanfordnlp import StanfordNLPLanguage

stanfordnlp.download("ru")

snlp = stanfordnlp.Pipeline(lang="ru")
nlp = StanfordNLPLanguage(snlp)

doc = nlp("Привет мир, это Россия")
for token in doc:
    print(token.text, token.lemma_, token.pos_, token.dep_)
like image 153
DhruvPathak Avatar answered Oct 10 '22 13:10

DhruvPathak


You can use Spacy with russian model ru2 from this project. It works.

like image 26
Mikhail Gerasimov Avatar answered Oct 10 '22 12:10

Mikhail Gerasimov