Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deep learnin on Google Colab: loading large image dataset is very long, how to accelerate the process?

I'm working on a Deep Learning model using Keras and to speed up the computation I'd like to use the GPU available on google colab.

My image files are already loaded on my google drive. I have 24'000 images for training on 4'000 for testing my model.

However when I load my images into an array, it takes a very long time (almost 2h) So it is not very convenient to do that every time I use google colab notebook.

Would you know how to accelerate the process ? This is my current code:

TRAIN_DIR  = "Training_set/X"
TRAIN_DIR_Y = "Training_set/Y"
IMG_SIZE = 128

def parse_img_data(path):
    X_train = []
    index_train = []
    img_ind = []
    for img in tqdm(os.listdir(path)):
        img_ind.append(int(img.split('.')[0])-1)
        path = os.path.join(TRAIN_DIR,img)
        img = cv2.imread(path,cv2.IMREAD_COLOR)
        img = cv2.resize(img, (IMG_SIZE,IMG_SIZE))
        X_train.append(np.array(img))
    return np.array(img_ind), np.array(X_train)

ind_train, X_train = parse_img_data(TRAIN_DIR)

I'd be very grateful if you would help me.

Xavier

like image 832
Xavier Avatar asked Jan 18 '19 09:01

Xavier


People also ask

How do I make google colab train faster?

Before we even start writing any Python code, we need to first set up Colab's runtime environment to use GPUs or TPUs instead of CPUs. Colab's notebooks use CPUs by default — to change the runtime type to GPUs or TPUs, select “Change runtime type” under “Runtime” from Colab's menu bar.

Does Internet speed affect google Colab?

I think yes, Google Colab's speed is affected by our Internet connection.

How long can I use google colab GPU?

Colab Pro and Pro+ limit sessions to 24 hours.


2 Answers

Not sure if you solve the issue. I was having the same problem. After I use os.listdir to the particular data folder before I ran CNN and worked.

print(os.listdir("./drive/My Drive/Colab Notebooks/dataset"))
like image 168
Cacey Avatar answered Sep 22 '22 14:09

Cacey


from numpy import savez_compressed trainX, trainy = parse_img_data('/content/drive/My Drive/Training_set/') savez_compressed('dataset.npz', trainX, train)

for the first time you can load and save the data then Use it over and over again

import numpy as np data=np.load('/content/drive/My Drive/dataset.npz') trainX,trainy=data['arr_0'],data['arr_1']

like image 29
maki Avatar answered Sep 24 '22 14:09

maki