Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

interface between google colaboratory and google cloud

From google colaboratory, if I want to read/write to a folder in a given bucket created in google cloud, how do I achieve this?

I have created a bucket, a folder within the bucket and uploaded bunch of images into it. Now from colaboratory, using jupyter notebook, want to create multiple sub-directories to organise these images into train, validation and test folders.

Subsequently access respective folders for training, validating and testing the model.

With Google drive, we just update the path to direct to specific directory with following commands, after authentication.

import sys
sys.path.append('drive/xyz')

We do some thing similar on desktop version also

import os
os.chdir(local_path)

Does some thing similar exist for Google Cloud Storage?

I colaboratory FAQs, it has procedure for reading and writing a single file, where we need to set the entire path. That will be tedious to re-organise a main directory into sub-directories and access them separately.

like image 804
Srinivasa Rao Avatar asked Feb 28 '18 03:02

Srinivasa Rao


People also ask

Is Google colab same as Google cloud?

Google Colab is a free cloud service that offers Jupyter Notebooks via remote servers. Students can use GPU and TPU resources from Google to run their Python code using Google Colab. For a quick introduction, Google's Colab intro notebook is great.

Is colab cloud based?

While Jupyter Notebook needs installation on a computer and can only use local machine resources, Colab is a full-fledged cloud app for Python coding.

Which is better colab or Google?

Kaggle. Kaggle is another Google product with similar functionalities to Colab. Like Colab, Kaggle provides free browser-based Jupyter Notebooks and GPUs. Kaggle also comes with many Python packages preinstalled, lowering the barrier to entry for some users.


1 Answers

In general it's not a good idea to try to mount a GCS bucket on the local machine (which would allow you to use it as you mentioned). From Connecting to Cloud Storage buckets:

Note: Cloud Storage is an object storage system that does not have the same write constraints as a POSIX file system. If you write data to a file in Cloud Storage simultaneously from multiple sources, you might unintentionally overwrite critical data.

Assuming you'd like to continue regardless of the warning, if you use a Linux OS you may be able to mount it using the Cloud Storage FUSE adapter. See related How to mount Google Bucket as local disk on Linux instance with full access rights.

The recommended way to access GCS from python apps is using the Cloud Storage Client Libraries, but accessing files will be different than in your snippets. You can find some examples at Python Client for Google Cloud Storage:

from google.cloud import storage
client = storage.Client()
# https://console.cloud.google.com/storage/browser/[bucket-id]/
bucket = client.get_bucket('bucket-id-here')
# Then do other things...
blob = bucket.get_blob('remote/path/to/file.txt')
print(blob.download_as_string())
blob.upload_from_string('New contents!')
blob2 = bucket.blob('remote/path/storage.txt')
blob2.upload_from_filename(filename='/local/path.txt')

Update:

The Colaboratory doc recommends another method that I forgot about, based on the Google API Client Library for Python, but note that it also doesn't operate like a regular filesystem, it's using an intermediate file on the local filesystem:

  • uploading files to GCS
  • downloading files from GCS:
like image 200
Dan Cornilescu Avatar answered Nov 15 '22 00:11

Dan Cornilescu