Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting Huggingface cache in Google Colab notebook to Google Drive

I am using Google Colab to implement Huggingface code.

What is the best method to change huggingface cache directory in Colab environment to my Google Drive (GDrive), so that we won't need to download the cached content i.e. language models, datasets...etc. every-time we initiate Colab environment? rather, just redirect huggingface in Colab to use GDrive.

I tried setting the related environment variables in Colab, still, the content is downloaded in Colab runtime environment:

export TRANSFORMERS_CACHE='/content/drive/MyDrive/Colab Notebooks/NLP/HuggingfaceCash'
export HF_DATASETS_CACHE='/content/drive/MyDrive/Colab Notebooks/NLP/HuggingfaceCash/Datasets'
like image 526
Mohammad Fasha Avatar asked Oct 24 '25 02:10

Mohammad Fasha


1 Answers

For anyone interested, I tried the following (using python code), and it worked fine. The content is cached in ones Google drive.

import os
os.environ['TRANSFORMERS_CACHE'] = '/content/drive/MyDrive/Colab Notebooks/NLP/HuggingfaceCash'
os.environ['HF_DATASETS_CACHE'] = '/content/drive/MyDrive/Colab Notebooks/NLP/HuggingfaceCash/Datasets'

Also, found another alternative at Stackoverflow

Where you can set the cache directory in the command itself, did not try though:

tokenizer = AutoTokenizer.from_pretrained("roberta-base", cache_dir="new_cache_dir/")

model = AutoModelForMaskedLM.from_pretrained("roberta-base", cache_dir="new_cache_dir/")
like image 133
Mohammad Fasha Avatar answered Oct 27 '25 01:10

Mohammad Fasha