I am running spaCy v2.x on a windows box with python3. I do not have admin privelages, so i have to call the pipeline as:
nlp = en_core_web_sm.load()
When I run my same script on a *nix box, I can load the pipeline as:
nlp = spacy.load('en', disable = ['ner', 'tagger', 'parser', 'textcat'])
All I am do is tokenizing, so I do not need the entire pipeline. On the windows box, if I load the pipeline like:
nlp = en_core_web_sm.load(disable = ['ner', 'tagger', 'parser', 'textcat'])
Does that actually disable the components?
spaCy information on the nlp pipeline
NLP Pipeline is a set of steps followed to build an end to end NLP software. Before we started we have to remember this things pipeline is not universal, Deep Learning Pipelines are slightly different, and Pipeline is non-linear.
When you call nlp on a text, spaCy first tokenizes the text to produce a Doc object. The Doc is then processed in several different steps – this is also referred to as the processing pipeline. The pipeline used by the trained pipelines typically include a tagger, a lemmatizer, a parser and an entity recognizer.
For example, en_core_web_sm is a small English pipeline trained on written web text (blogs, news, comments), that includes vocabulary, syntax and entities.
nlp. pipe returns a generator on purpose! Generators are awesome. They are more memory-friendly in that they let you iterate over a series of objects, but unlike a list, they only evaluate the next object when they need to, rather than all at once.
You can check the current pipeline components by
print(nlp.pipe_names)
If you are not convinced by the output, you can manually check by trying to use the component and try to print the output. E.g try to disable parser and print dependency tags.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With