Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use LanguageDetector() from spacy_langdetect package?

Tags:

python

spacy

I'm trying to use the spacy_langdetect package and the only example code I can find is (https://spacy.io/universe/project/spacy-langdetect):

import spacy
from spacy_langdetect import LanguageDetector
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe(LanguageDetector(), name='language_detector', last=True)
text = 'This is an english text.'
doc = nlp(text)
print(doc._.language)

It's throwing error: nlp.add_pipe now takes the string name of the registered component factory, not a callable component.

So I tried using the below for adding to my nlp pipeline

language_detector = LanguageDetector()
nlp.add_pipe("language_detector")

But this gives error: Can't find factory for 'language_detector' for language English (en). This usually happens when spaCy calls nlp.create_pipe with a custom component name that's not registered on the current language class. If you're using a Transformer, make sure to install 'spacy-transformers'. If you're using a custom component, make sure you've added the decorator @Language.component (for function components) or @Language.factory (for class components). Available factories: attribute_ruler, tok2vec, merge_noun_chunks, merge_entities, merge_subtokens, token_splitter, parser, beam_parser, entity_linker, ner, beam_ner, entity_ruler, lemmatizer, tagger, morphologizer, senter, sentencizer, textcat, textcat_multilabel, en.lemmatizer

I don't fully understand how to add it since it's not really a custom component.

like image 883
user3242036 Avatar asked Mar 19 '21 17:03

user3242036


People also ask

How do you install spaCy transformers?

🚀 InstallationInstalling the package from pip will automatically install all dependencies, including PyTorch and spaCy. Make sure you install this package before you install the models. Also note that this package requires Python 3.6+, PyTorch v1. 5+ and spaCy v3.

How does langdetect work with Spacy?

Detect the language of the sentences of a document. Out of the box, under the hood, it uses langdetect to detect languages on spaCy's Doc and Span objects. Here is how to use it for spaCy 3.0 see here for an example with spaCy 2.0.

What is spacy_language_detection?

Spacy_language_detection is a fully customizable language detection for spaCy pipeline forked from spacy-langdetect in order to fix the seed problem (see this issue) and to update it with spaCy 3.0. Detect the language of the sentences of a document.

Does NLTK or Spacy automatically determine the language of a text?

Short answer is no, neither NLTK nor SpaCy will automatically determine the language and apply appropriate algorithms to a text. SpaCy has separate language models with their own methods, part-of-speech and dependency tagsets. It also has a set of stopwords for each available language.

Why can't find factory for 'language_detector' for Language English(en)?

It's throwing error: nlp.add_pipe now takes the string name of the registered component factory, not a callable component. But this gives error: Can't find factory for 'language_detector' for language English (en). This usually happens when spaCy calls nlp.create_pipe with a custom component name that's not registered on the current language class.


1 Answers

With spaCy v3.0 for components not built-in such as LanguageDetector, you will have to wrap it into a function prior to adding it to the nlp pipe. In your example, you can do the following:

import spacy
from spacy.language import Language
from spacy_langdetect import LanguageDetector

def get_lang_detector(nlp, name):
    return LanguageDetector()

nlp = spacy.load("en_core_web_sm")
Language.factory("language_detector", func=get_lang_detector)
nlp.add_pipe('language_detector', last=True)
text = 'This is an english text.'
doc = nlp(text)
print(doc._.language)

For built-in components (i.e. tagger, parser, ner, etc.), see: https://spacy.io/usage/processing-pipelines

like image 81
Eric Avatar answered Nov 14 '22 22:11

Eric