The default cache directory is lack of disk capacity, I need change the configure of the default cache directory.
So if you don't have any specific environment variable set, the cache directory will be at ~/. cache/huggingface/transformers/ .
What is a Cached Dataset? Essentially, Cached Datasets are a way to pre-compute data for hundreds or thousands of entities at once. Once setup, you can retrieve results for any one of those entities instantly.
You can specify the cache directory everytime you load a model with .from_pretrained by the setting the parameter cache_dir
. You can define a default location by exporting an environment variable TRANSFORMERS_CACHE everytime before you use (i.e. before importing it!) the library).
Example for python:
import os
os.environ['TRANSFORMERS_CACHE'] = '/blabla/cache/'
Example for bash:
export TRANSFORMERS_CACHE=/blabla/cache/
As @cronoik mentioned, alternative to modify the cache path in the terminal, you can modify the cache directory directly in your code. I will just provide you with the actual code if you are having any difficulty looking it up on HuggingFace:
tokenizer = AutoTokenizer.from_pretrained("roberta-base", cache_dir="new_cache_dir/")
model = AutoModelForMaskedLM.from_pretrained("roberta-base", cache_dir="new_cache_dir/")
I'm writing this answer because there are other Hugging Face cache directories that also eat space in the home directory besides the model cache and the previous answers and comments did not make this clear.
The Transformers documentation describes how the default cache directory is determined:
Cache setup
Pretrained models are downloaded and locally cached at: ~/.cache/huggingface/transformers/. This is the default directory given by the shell environment variable TRANSFORMERS_CACHE. On Windows, the default directory is given by C:\Users\username.cache\huggingface\transformers. You can change the shell environment variables shown below - in order of priority - to specify a different cache directory:
- Shell environment variable (default): TRANSFORMERS_CACHE.
- Shell environment variable: HF_HOME + transformers/.
- Shell environment variable: XDG_CACHE_HOME + /huggingface/transformers.
What this piece of documentation doesn't explicitly mention is that HF_HOME defaults to $XDG_CACHE_HOME/huggingface and is used for other huggingface caches, e.g. the datasets cache, which is separate from the transformers cache. The value of XDG_CACHE_HOME is machine dependent, but usually it is $HOME/.cache (and HF defaults to this value if XDG_CACHE_HOME is not set) - thus the usual default $HOME/.cache/huggingface
So you probably will want to change the HF_HOME environment variable (and possibly set a symlink to catch cases where the environment variable is not set).
This environment variable is also respected by Hugging Face datasets library, although the documentation does not explicitly state this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With