I'm trying to use the spacy_langdetect package and the only example code I can find is (https://spacy.io/universe/project/spacy-langdetect):
import spacy
from spacy_langdetect import LanguageDetector
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe(LanguageDetector(), name='language_detector', last=True)
text = 'This is an english text.'
doc = nlp(text)
print(doc._.language)
It's throwing error: nlp.add_pipe
now takes the string name of the registered component factory, not a callable component.
So I tried using the below for adding to my nlp pipeline
language_detector = LanguageDetector()
nlp.add_pipe("language_detector")
But this gives error: Can't find factory for 'language_detector' for language English (en). This usually happens when spaCy calls nlp.create_pipe
with a custom component name that's not registered on the current language class. If you're using a Transformer, make sure to install 'spacy-transformers'. If you're using a custom component, make sure you've added the decorator @Language.component
(for function components) or @Language.factory
(for class components).
Available factories: attribute_ruler, tok2vec, merge_noun_chunks, merge_entities, merge_subtokens, token_splitter, parser, beam_parser, entity_linker, ner, beam_ner, entity_ruler, lemmatizer, tagger, morphologizer, senter, sentencizer, textcat, textcat_multilabel, en.lemmatizer
I don't fully understand how to add it since it's not really a custom component.
🚀 InstallationInstalling the package from pip will automatically install all dependencies, including PyTorch and spaCy. Make sure you install this package before you install the models. Also note that this package requires Python 3.6+, PyTorch v1. 5+ and spaCy v3.
Detect the language of the sentences of a document. Out of the box, under the hood, it uses langdetect to detect languages on spaCy's Doc and Span objects. Here is how to use it for spaCy 3.0 see here for an example with spaCy 2.0.
Spacy_language_detection is a fully customizable language detection for spaCy pipeline forked from spacy-langdetect in order to fix the seed problem (see this issue) and to update it with spaCy 3.0. Detect the language of the sentences of a document.
Short answer is no, neither NLTK nor SpaCy will automatically determine the language and apply appropriate algorithms to a text. SpaCy has separate language models with their own methods, part-of-speech and dependency tagsets. It also has a set of stopwords for each available language.
It's throwing error: nlp.add_pipe now takes the string name of the registered component factory, not a callable component. But this gives error: Can't find factory for 'language_detector' for language English (en). This usually happens when spaCy calls nlp.create_pipe with a custom component name that's not registered on the current language class.
With spaCy v3.0 for components not built-in such as LanguageDetector, you will have to wrap it into a function prior to adding it to the nlp pipe. In your example, you can do the following:
import spacy
from spacy.language import Language
from spacy_langdetect import LanguageDetector
def get_lang_detector(nlp, name):
return LanguageDetector()
nlp = spacy.load("en_core_web_sm")
Language.factory("language_detector", func=get_lang_detector)
nlp.add_pipe('language_detector', last=True)
text = 'This is an english text.'
doc = nlp(text)
print(doc._.language)
For built-in components (i.e. tagger, parser, ner, etc.), see: https://spacy.io/usage/processing-pipelines
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With