How to use LanguageDetector() from spacy_langdetect package?

Tags:

spacy

I'm trying to use the spacy_langdetect package and the only example code I can find is (https://spacy.io/universe/project/spacy-langdetect):

import spacy
from spacy_langdetect import LanguageDetector
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe(LanguageDetector(), name='language_detector', last=True)
text = 'This is an english text.'
doc = nlp(text)
print(doc._.language)

It's throwing error: nlp.add_pipe now takes the string name of the registered component factory, not a callable component.

So I tried using the below for adding to my nlp pipeline

language_detector = LanguageDetector()
nlp.add_pipe("language_detector")

But this gives error: Can't find factory for 'language_detector' for language English (en). This usually happens when spaCy calls nlp.create_pipe with a custom component name that's not registered on the current language class. If you're using a Transformer, make sure to install 'spacy-transformers'. If you're using a custom component, make sure you've added the decorator @Language.component (for function components) or @Language.factory (for class components). Available factories: attribute_ruler, tok2vec, merge_noun_chunks, merge_entities, merge_subtokens, token_splitter, parser, beam_parser, entity_linker, ner, beam_ner, entity_ruler, lemmatizer, tagger, morphologizer, senter, sentencizer, textcat, textcat_multilabel, en.lemmatizer

I don't fully understand how to add it since it's not really a custom component.

883

asked Mar 19 '21 17:03

user3242036

1 Answers

With spaCy v3.0 for components not built-in such as LanguageDetector, you will have to wrap it into a function prior to adding it to the nlp pipe. In your example, you can do the following:

import spacy
from spacy.language import Language
from spacy_langdetect import LanguageDetector

def get_lang_detector(nlp, name):
    return LanguageDetector()

nlp = spacy.load("en_core_web_sm")
Language.factory("language_detector", func=get_lang_detector)
nlp.add_pipe('language_detector', last=True)
text = 'This is an english text.'
doc = nlp(text)
print(doc._.language)

For built-in components (i.e. tagger, parser, ner, etc.), see: https://spacy.io/usage/processing-pipelines

answered Nov 14 '22 22:11

Eric

Related questions
                            
                                Finding count of tuples with same first and third item in list of tuples
                            
                                loading EMNIST-letters dataset
                            
                                How to install pyzmq on an Alpine Linux container?
                            
                                Python request gives 415 error while post data
                            
                                Frame from video is upside down after extracting
                            
                                change column name pandas
                            
                                A pythonic and uFunc-y way to turn pandas column into "increasing" index? [duplicate]
                            
                                Double header in Matplotlib Table
                            
                                Can't start spyder because of PyQt5.QtWebKitWidgets
                            
                                How to rewrite this simple loop using assignment expressions introduced in Python 3.8 alpha?
                            
                                Django messages middleware issue while testing post request
                            
                                How to convert single list's elements in form of dictionary
                            
                                PyInstaller exe returning error on a Tkinter script
                            
                                find index of a value before the maximum for each column in python dataframe
                            
                                How to separate Pandas column that contains values stored as text and numbers into two seperate columns
                            
                                How to flatten a list that has: primitives data types, lists and generators?
                            
                                How can I select rows from a Pandas dataframe were any value is not equal to a number?
                            
                                How to see complete rows in Google Colab
                            
                                Python sklearn installation windows
                            
                                Correct way of normalizing and scaling the MNIST dataset

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With