Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transformers v4.x: Convert slow tokenizer to fast tokenizer

I'm following the transformer's pretrained model xlm-roberta-large-xnli example

from transformers import pipeline
classifier = pipeline("zero-shot-classification",
                      model="joeddav/xlm-roberta-large-xnli")

and I get the following error

ValueError: Couldn't instantiate the backend tokenizer from one of: (1) a `tokenizers` library serialization file, (2) a slow tokenizer instance to convert or (3) an equivalent slow tokenizer class to instantiate and convert. You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

I'm using Transformers version '4.1.1'

like image 918
Miguel Trejo Avatar asked Dec 23 '20 22:12

Miguel Trejo


2 Answers

According to Transformers v4.0.0 release, sentencepiece was removed as a required dependency. This means that

"The tokenizers that depend on the SentencePiece library will not be available with a standard transformers installation"

including the XLMRobertaTokenizer. However, sentencepiece can be installed as an extra dependency

pip install transformers[sentencepiece]

or

pip install sentencepiece

if you have transformers already installed.

like image 94
Miguel Trejo Avatar answered Oct 01 '22 07:10

Miguel Trejo


If you are in google collab:

  1. Factory reset the runtime.
  2. Upgrade the pip by using the following command (pip install --upgrade pip)
  3. Install sentencepiece using the following command (!pip install sentencepiece)
like image 42
Ayan Mondal Avatar answered Oct 01 '22 07:10

Ayan Mondal