I am using Google Colab to implement Huggingface code.
What is the best method to change huggingface cache directory in Colab environment to my Google Drive (GDrive), so that we won't need to download the cached content i.e. language models, datasets...etc. every-time we initiate Colab environment? rather, just redirect huggingface in Colab to use GDrive.
I tried setting the related environment variables in Colab, still, the content is downloaded in Colab runtime environment:
export TRANSFORMERS_CACHE='/content/drive/MyDrive/Colab Notebooks/NLP/HuggingfaceCash'
export HF_DATASETS_CACHE='/content/drive/MyDrive/Colab Notebooks/NLP/HuggingfaceCash/Datasets'
For anyone interested, I tried the following (using python code), and it worked fine. The content is cached in ones Google drive.
import os
os.environ['TRANSFORMERS_CACHE'] = '/content/drive/MyDrive/Colab Notebooks/NLP/HuggingfaceCash'
os.environ['HF_DATASETS_CACHE'] = '/content/drive/MyDrive/Colab Notebooks/NLP/HuggingfaceCash/Datasets'
Also, found another alternative at Stackoverflow
Where you can set the cache directory in the command itself, did not try though:
tokenizer = AutoTokenizer.from_pretrained("roberta-base", cache_dir="new_cache_dir/")
model = AutoModelForMaskedLM.from_pretrained("roberta-base", cache_dir="new_cache_dir/")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With