Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Upload Many Files to Google Colab?

I am working on a image segmentation machine learning project and I would like to test it out on Google Colab.

For the training dataset, I have 700 images, mostly 256x256, that I need to upload into a python numpy array for my project. I also have thousands of corresponding mask files to upload. They currently exist in a variety of subfolders on Google drive, but I have been unable to upload them to Google Colab for use in my project.

So far I have attempted using Google Fuse which seems to have very slow upload speeds and PyDrive which has given me a variety of authentication errors. I have been using the Google Colab I/O example code for the most part.

How should I go about this? Would PyDrive be the way to go? Is there code somewhere for uploading a folder structure or many files at a time?

like image 862
Jesse Cambon Avatar asked Feb 19 '18 23:02

Jesse Cambon


People also ask

Is there a limit on Google Colab?

Colab Pro limits RAM to 32 GB while Pro+ limits RAM to 52 GB. Colab Pro and Pro+ limit sessions to 24 hours.

How to upload files from local machine to Google Colab?

You can use the upload option at the top of the Files explorer to upload any file (s) from your local machine to Google Colab. Here is what you need to do: Step 1: Click the Files icon to open the “Files explorer” pane Step 2: Click the upload icon and select the file (s) you wish to upload from the “File Upload” dialog window.

How to upload a CSV file in Colab?

It is the easiest way to to upload a CSV file in Colab. For this go to the dataset in your github repository, and then click on “View Raw”. Copy the link to the raw dataset and pass it as a parameter to the read_csv () in pandas to get the dataframe. We can import datasets that are uploaded on our google drive in two ways :

How do I upload data to Google Drive?

The data is uploaded from and downloaded into your Google Drive only. You can then tranfer that data into your local machine. 1.Upload your dataset to free cloud storage like dropbox, openload, etc. (I used dropbox) 2.Create a shareable link of your uploaded file and copy it. That's it! Zip you file first then upload it to Google Drive.

How do I connect my Google Drive to Colab?

Upload your data to Google Drive before getting started with the notebook. Then you mount your Google Drive onto the Colab environment: this means that the Colab notebook can now access files in your Google Drive. Mount your drive using drive.mount () 2. Access anything in your Google Drive directly


1 Answers

You can put all your data into your google drive and then mount drive. This is how I have done it. Let me explain in steps.

Step 1: Transfer your data into your google drive.

Step 2: Run the following code to mount you google drive.

# Install a Drive FUSE wrapper.
# https://github.com/astrada/google-drive-ocamlfuse
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse



# Generate auth tokens for Colab
from google.colab import auth
auth.authenticate_user()


# Generate creds for the Drive FUSE library.
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}


# Create a directory and mount Google Drive using that directory.
!mkdir -p My Drive
!google-drive-ocamlfuse My Drive


!ls My Drive/

# Create a file in Drive.
!echo "This newly created file will appear in your Drive file list." > My Drive/created.txt

Step 3: Run the following line to check if you can see your desired data into mounted drive.

!ls Drive

Step 4:

Now load your data into numpy array as follows. I had my exel files having my train and cv and test data.

train_data = pd.read_excel(r'Drive/train.xlsx')
test = pd.read_excel(r'Drive/test.xlsx')
cv= pd.read_excel(r'Drive/cv.xlsx')

I hope it can help.

Edit

For downloading the data into your drive from the colab notebook environment, you can run the following code.

# Install the PyDrive wrapper & import libraries.
# This only needs to be done once in a notebook.
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials



# Authenticate and create the PyDrive client.
# This only needs to be done once in a notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)



# Create & upload a file.
uploaded = drive.CreateFile({'data.xlsx': 'data.xlsx'})
uploaded.SetContentFile('data.xlsx')
uploaded.Upload()
print('Uploaded file with ID {}'.format(uploaded.get('id')))
like image 98
Abdul Karim Khan Avatar answered Oct 12 '22 01:10

Abdul Karim Khan