Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google Colab is so slow while reading images from Google Drive

I have my own dataset for a deep learning project. I uploaded that into Google Drive and linked it to a Colab page. But Colab could read only 2-3 images in a second, where my computer can dozens of them. (I used imread to read images.)

There is no speed problem with model compiling process of keras, but only with reading images from Google Drive. Does anybody know a solution? Someone suffered of this problem too, but it's still unsolved: Google Colab very slow reading data (images) from Google Drive (I know this is kind of a duplication of the question in the link, but I reposted it because it is still unsolved. I hope this is not a violation of Stack Overflow rules.)

Edit: The code piece that I use for reading images:

def getDataset(path, classes, pixel=32, rate=0.8):
    X = []
    Y = []

    i = 0
    # getting images:
    for root, _, files in os.walk(path):
        for file in files:
            imagePath = os.path.join(root, file)
            className = os.path.basename(root)

            try:
                image = Image.open(imagePath)
                image = np.asarray(image)
                image = np.array(Image.fromarray(image.astype('uint8')).resize((pixel, pixel)))
                image = image if len(image.shape) == 3 else color.gray2rgb(image)
                X.append(image)
                Y.append(classes[className])
            except:
                print(file, "could not be opened")

    X = np.asarray(X, dtype=np.float32)
    Y = np.asarray(Y, dtype=np.int16).reshape(1, -1)

    return shuffleDataset(X, Y, rate)
like image 227
Atreidex Avatar asked Nov 30 '19 22:11

Atreidex


People also ask

How do I read an image from Google Drive in Colab?

The second method, insert an image from Google drive: upload your image to the same folder where your colab notebook is residing. Once done, head over to the drive folder set the permission to “Anyone with the link can view”.


2 Answers

I'd like to provide a more detailed answer about what unzipping the files actually looks like. This is the best way to speed up reading data because unzipping the file into the VM disk is SO much faster than reading each file individually from Drive.

Let's say you have the desired images or data in your local machine in a folder Data. Compress Data to get Data.zip and upload it to Drive.

Now, mount your drive and run the following command:

!unzip "/content/drive/My Drive/path/to/Data.Zip" -d "/content"

Simply amend all your image paths to go through /content/Data, and reading your images will be much much faster.

like image 80
Alex Trevithick Avatar answered Oct 17 '22 17:10

Alex Trevithick


I recommend you to upload your file to GitHub then clone it to Colab. It can reduce my training time from 1 hour to 3 minutes.

like image 20
Waranthorn Chansawang Avatar answered Oct 17 '22 18:10

Waranthorn Chansawang