Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google Colab Storage

Does anybody know the storage limits for running Google Colab? I seem to run out of space after uploading 22gb zip file, and then trying to unzip it, suggesting <~40gb storage being available. At least this is my experience running the TPU instance.

like image 354
Ferhat Avatar asked Oct 27 '18 15:10

Ferhat


2 Answers

Presently, the amount of local storage in colab depends on the chosen hardware accelerator runtime type:

# Hardware accelerator none
!df -h .
Filesystem      Size  Used Avail Use% Mounted on
overlay          49G   22G   26G  46% /

# Hardware accelerator GPU
!df -h .
Filesystem      Size  Used Avail Use% Mounted on
overlay         359G   23G  318G   7% /

# Hardware accelerator TPU
!df -h .
Filesystem      Size  Used Avail Use% Mounted on
overlay          49G   22G   26G  46% /

Even if you don't need a GPU, swithcing to that runtime type will provide you with an extra 310Gb of storage space.

like image 156
marsipan Avatar answered Sep 19 '22 01:09

marsipan


Yes, the Colab notebook local storage is about 40 GiB right now. One way to see the exact value (in Python 3):

import subprocess
p = subprocess.Popen('df -h', shell=True, stdout=subprocess.PIPE)
print(str(p.communicate()[0], 'utf-8'))

However: for large amounts of data, local storage is a non-optimal way to feed the TPU, which is not connected directly to the machine running the notebook. Instead, consider storing your large dataset in GCP storage, and sourcing that data from the Colab notebook. (Moreover, the amount of Colab local storage may change, and the Colab notebook itself will expire after a few hours, taking local storage with it.)

Take a look at the canonical TPU Colab notebook. At the bottom are some next steps, which include a link to Searching Shakespeare with TPUs. In that notebook is the following code fragment, which demonstrates GCP authentication to your Colab TPU. It looks like this:

from google.colab import auth
auth.authenticate_user()

if 'COLAB_TPU_ADDR' in os.environ:
  TF_MASTER = 'grpc://{}'.format(os.environ['COLAB_TPU_ADDR'])

  # Upload credentials to TPU.
  with tf.Session(TF_MASTER) as sess:    
    with open('/content/adc.json', 'r') as f:
      auth_info = json.load(f)
    tf.contrib.cloud.configure_gcs(sess, credentials=auth_info)
  # Now credentials are set for all future sessions on this TPU.
else:
  TF_MASTER=''
like image 20
Derek T. Jones Avatar answered Sep 19 '22 01:09

Derek T. Jones