Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Persisting data in Google Colaboratory

Has anyone figured out a way to keep files persisted across sessions in Google's newly open sourced Colaboratory?

Using the sample notebooks, I'm successfully authenticating and transferring csv files from my Google Drive instance and have stashed them in /tmp, my ~, and ~/datalab. Pandas can read them just fine off of disk too. But once the session times out , it looks like the whole filesystem is wiped and a new VM is spun up, without downloaded files.

I guess this isn't surprising given Google's Colaboratory Faq:

Q: Where is my code executed? What happens to my execution state if I close the browser window?

A: Code is executed in a virtual machine dedicated to your account. Virtual machines are recycled when idle for a while, and have a maximum lifetime enforced by the system.

Given that, maybe this is a feature (ie "go use Google Cloud Storage, which works fine in Colaboratory")? When I first used the tool, I was hoping that any .csv files that were in the My File/Colab Notebooks Google Drive folder would be also loaded onto the VM instance that the notebook was running on :/

like image 632
user3424705 Avatar asked Nov 09 '17 04:11

user3424705


People also ask

Is colab storage persistent?

Its acts a persistent storage for the Colab Virtual Machine, so that you won't lose your trained data in case it gets disconnected from the run time. You can load your data set once, and use it hassle free whenever you reconnect to a new runtime.

Does google colab store data?

Also, Colab has a disk space limitation of 108 GB, of which only 77 GB is available to the user. While this should be enough for most tasks, keep this in mind while working with larger datasets like image or video data.

What are the limitations of google Colab?

Colab Pro limits RAM to 32 GB while Pro+ limits RAM to 52 GB. Colab Pro and Pro+ limit sessions to 24 hours. Colab Pro does not provide background execution, while Pro+ does. Colab Pro and Pro+ do not offer a full version of JupyterLab.


2 Answers

Put that before your code, so will always download your file before run your code.

!wget -q http://www.yoursite.com/file.csv 
like image 186
Marcel Pinheiro Avatar answered Sep 29 '22 09:09

Marcel Pinheiro


Your interpretation is correct. VMs are ephemeral and recycled after periods of inactivity. There's no mechanism for persistent data on the VM itself right now.

In order for data to persist, you'll need to store it somewhere outside of the VM, e.g., Drive, GCS, or any other cloud hosting provider.

Some recipes for loading and saving data from external sources is available in the I/O example notebook.

like image 43
Bob Smith Avatar answered Sep 29 '22 09:09

Bob Smith