Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Downloading transformers models to use offline

I have a trained transformers NER model that I want to use on a machine not connected to the internet. When loading such a model, currently it downloads cache files to the .cache folder.

To load and run the model offline, you need to copy the files in the .cache folder to the offline machine. However, these files have long, non-descriptive names, which makes it really hard to identify the correct files if you have multiple models you want to use. Any thoughts on this?

Example of model files

like image 394
Stefanie Avatar asked Jun 08 '20 12:06

Stefanie


People also ask

How do you use Huggingface offline?

🤗 Transformers is able to run in a firewalled or offline environment by only using local files. Set the environment variable TRANSFORMERS_OFFLINE=1 to enable this behavior. Add 🤗 Datasets to your offline training workflow by setting the environment variable HF_DATASETS_OFFLINE=1 .

How do I manually download Huggingface?

The models are automatically cached locally when you first use it. So, to download a model, all you have to do is run the code that is provided in the model card (I chose the corresponding model card for bert-base-uncased ). When you run this code for the first time, you will see a download bar appear on screen.

Where are hugging face models downloaded?

On Linux, it is at ~/. cache/huggingface/transformers.


2 Answers

One relatively easy way to deal with this issue is to simply "rename" the pretrained models, as is detailed in this thread.

Essentially, all you have to do is something like this for whatever model you're trying to work with:

from transformers import BertModel

model = BertModel.from_pretrained("bert-base-uncased")
model.save_pretrained("./my_named_bert")

The thread also details how the local model folders are named, see LysandreJik's post:

Hi, they are named as such because that's a clean way to make sure the model on the S3 is the same as the model in the cache. The name is created from the etag of the file hosted on the S3. [...]

like image 102
dennlinger Avatar answered Oct 06 '22 08:10

dennlinger


For the first time save the model using model.save_pretrained("./your_file_name") and load the model from your file through BertModel.from_pretrained("./your_file_name") Do the same for tokenizer if you are using it.

model.save_pretrained("./your_file_name")
BertModel.from_pretrained("./your_file_name")
like image 35
Sai Dinesh Pola Avatar answered Oct 06 '22 08:10

Sai Dinesh Pola