Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

textcat -> architecture extra fields not permitted

I've been trying to practise what I've learned from this tutorial:(https://realpython.com/sentiment-analysis-python/) using PyCharm.

And this line:

textcat.add_label("pos")

generated a warning: Cannot find reference 'add_label' in '(Doc) -> Doc | (Doc) -> Doc'

I understand that this is because "nlp.create_pipe()" generates a Doc not a string, but (essentially because I don't know what to do in this case!) I ran the script anyway, but then I got the an error from this line:

textcat = nlp.create_pipe("textcat", config={"architecture": "simple_cnn"})

Error msg:

raise ConfigValidationError(
thinc.config.ConfigValidationError:

Config validation error

textcat -> architecture extra fields not permitted

{'nlp': <spacy.lang.en.English object at 0x0000015E74F625E0>, 'name': 'textcat', 'architecture': 'simple_cnn', 'model': {'@architectures': 'spacy.TextCatEnsemble.v2', 'linear_model': {'@architectures': 'spacy.TextCatBOW.v1', 'exclusive_classes': True, 'ngram_size': 1, 'no_output_layer': False}, 'tok2vec': {'@architectures': 'spacy.Tok2Vec.v2', 'embed': {'@architectures': 'spacy.MultiHashEmbed.v1', 'width': 64, 'rows': [2000, 2000, 1000, 1000, 1000, 1000], 'attrs': ['ORTH', 'LOWER', 'PREFIX', 'SUFFIX', 'SHAPE', 'ID'], 'include_static_vectors': False}, 'encode': {'@architectures': 'spacy.MaxoutWindowEncoder.v2', 'width': 64, 'window_size': 1, 'maxout_pieces': 3, 'depth': 2}}}, 'threshold': 0.5, '@factories': 'textcat'}

I'm using:

  • Pycharm v: 2019.3.4
  • python v: 3.8.6
  • spaCy v: 3.0.5
like image 565
Amira Avatar asked Mar 24 '21 23:03

Amira


2 Answers

Man! Did the that full spaCy upgrade really obliterate that tutorial or what...

There's a couple things you might be able to get around. I haven't fully fixed that broken tutorial. It's on the To-Do list. However, I did get around the exact issue you're having.

textcat = nlp.create_pipe("textcat", config={"architecture": "simple_cnn"})

This create_pipe behavior has been deprecated so you can just directly add to the workflow with add_pipe. So one thing you could do is the following:

from spacy.pipeline.textcat import single_label_cnn_config

<more good code>

nlp = spacy.load("en_core_web_trf")
if "textcat" not in nlp.pipe_names:
     nlp.add_pipe('textcat', config=single_label_cnn_config, last=True)
textcat = nlp.get_pipe('textcat')
textcat.add_label("pos")
textcat.add_label("neg")

Let me know if this makes sense and helps. I'll try to revamp the tutorial entirely from spaCy in the coming weeks.

like image 113
jlarks32 Avatar answered Nov 12 '22 01:11

jlarks32


This seems to have worked with spacy 3.1.0,

import en_core_web_md # or skip, see below
from spacy.pipeline.textcat import Config, single_label_cnn_config

nlp = en_core_web_md.load() # or nlp=spacy.load("en_core_web_sm")

config = Config().from_str(single_label_cnn_config)
if "textcat" not in nlp.pipe_names:
     nlp.add_pipe('textcat', config=config, last=True)

nlp.pipe_names
# ['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner', 'textcat']
like image 3
Marco.Gancitano Avatar answered Nov 11 '22 23:11

Marco.Gancitano