Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OSError when loading tokenizer for huggingface model

I am trying to use this huggingface model and have been following the example provided, but I am getting an error when loading the tokenizer:

from transformers import AutoTokenizer

task = 'sentiment'
MODEL = f"cardiffnlp/twitter-roberta-base-{task}"
tokenizer = AutoTokenizer.from_pretrained(MODEL)

OSError: Can't load tokenizer for 'cardiffnlp/twitter-roberta-base-sentiment'. Make sure that:

  • 'cardiffnlp/twitter-roberta-base-sentiment' is a correct model identifier listed on 'https://huggingface.co/models'

  • or 'cardiffnlp/twitter-roberta-base-sentiment' is the correct path to a directory containing relevant tokenizer files

What I find very weird is that I was able to run my script several times but ran into an error after some time, while I don't recall changing anything in the meantime. Does anyone know what's the solution here?


EDIT: Here is my entire script:

from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
from transformers import TFAutoModelForSequenceClassification
import numpy as np
from scipy.special import softmax
import csv
import urllib.request

task = 'sentiment'
MODEL = f"nlptown/bert-base-multilingual-uncased-{task}"

tokenizer = AutoTokenizer.from_pretrained(MODEL)

labels = ['very_negative', 'negative', 'neutral', 'positive', 'very_positive']

model = AutoModelForSequenceClassification.from_pretrained(MODEL)
model.save_pretrained(MODEL)

text = "I love you"
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
scores = output[0][0].detach().numpy()
scores = softmax(scores)

print(scores)

The error seems to start happening when I run model.save_pretrained(MODEL), but this might be a coincidence.

like image 430
GSwart Avatar asked May 31 '26 22:05

GSwart


1 Answers

1- Remove the per-trained cached folder. (./cardiffnlp in your example)

2- Add this code tokenizer.save_pretrained(MODEL) before model.save_pretrained(MODEL)

This will add tokenizer information in the cache folder then your code works fine after that.

like image 196
Mahoor13 Avatar answered Jun 02 '26 10:06

Mahoor13