I have downloaded Spacy English model and finding lemma using this code.
import spacy
nlp = spacy.load('en')
doc = nlp(u'Two apples')
for token in doc:
print(token, token.lemma, token.lemma_)
Output:
Two 11711838292424000352 two
apples 8566208034543834098 apple
Now I wanted to do same thing for Russian language. But Spacy don't have models for Russian language. But I am seeing their GitHub code for Russian language and I think that code could be used to find lemma.
I am new to Spacy. Will needed a starting point for those languages which don't have models. Also I have noted that for some languages let say for URDU they have provided a look up dictionary for lemmatization.
I want to expand this thing to all those languages which don't have models.
Note: In above code I believe that it could be further improved as in my case I needed lemma only so what are the things which I can turn off and how?
- spaCy recently launched a handy wrapper over Stanford NLP, so you can use StanfordNLP goodies seamlessly within spaCy pipelines:
https://github.com/explosion/spacy-stanfordnlp
The code would look something like this ( not tested ) :
import stanfordnlp
from spacy_stanfordnlp import StanfordNLPLanguage
stanfordnlp.download("ru")
snlp = stanfordnlp.Pipeline(lang="ru")
nlp = StanfordNLPLanguage(snlp)
doc = nlp("Привет мир, это Россия")
for token in doc:
print(token.text, token.lemma_, token.pos_, token.dep_)
You can use Spacy with russian model ru2
from this project. It works.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With