Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google colab pro GPU running extremely slow

I am running a Convnet on colab Pro GPU. I have selected GPU in my runtime and can confirm that GPU is available. I am running exactly the same network as yesterday evening, but it is taking about 2 hours per epoch... last night it took about 3 minutes per epoch... nothing has changed at all. I have a feeling colab may have restricted my GPU usage but I can't work out how to tell if this is the issue. Does GPU speed fluctuate much depending on time of day etc? Here are some diagnostics which I have printed, does anyone know how I can investigate deeper what the root cause of this slow behaviour is?

I also tried changing to accelerator in colab to 'None', and my network was the same speed as with 'GPU' selected, implying that for some reason i am no longer training on GPU, or resources have been severely limited. I am using Tensorflow 2.1.

gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Select the Runtime → "Change runtime type" menu to enable a GPU accelerator, ')
  print('and then re-execute this cell.')
else:
  print(gpu_info)

Sun Mar 22 11:33:14 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00    Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   40C    P0    32W / 250W |   8747MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+
def mem_report():
  print("CPU RAM Free: " + humanize.naturalsize( psutil.virtual_memory().available ))

  GPUs = GPUtil.getGPUs()
  for i, gpu in enumerate(GPUs):
    print('GPU {:d} ... Mem Free: {:.0f}MB / {:.0f}MB | Utilization {:3.0f}%'.format(i, gpu.memoryFree, gpu.memoryTotal, gpu.memoryUtil*100))

mem_report()
CPU RAM Free: 24.5 GB
GPU 0 ... Mem Free: 7533MB / 16280MB | Utilization  54%

Still no luck speeding things up, here is my code, maybe I have overlooked something... btw the images are from an old Kaggle competition, the data can be found here. The training images are saved on my google drive. https://www.kaggle.com/c/datasciencebowl

#loading images from kaggle api

#os.environ['KAGGLE_USERNAME'] = ""
#os.environ['KAGGLE_KEY'] = ""

#!kaggle competitions download -c datasciencebowl

#unpacking zip files

#zipfile.ZipFile('./sampleSubmission.csv.zip', 'r').extractall('./')
#zipfile.ZipFile('./test.zip', 'r').extractall('./')
#zipfile.ZipFile('./train.zip', 'r').extractall('./')

data_dir = pathlib.Path('train')

image_count = len(list(data_dir.glob('*/*.jpg')))
CLASS_NAMES = np.array([item.name for item in data_dir.glob('*') if item.name != "LICENSE.txt"])

shrimp_zoea = list(data_dir.glob('shrimp_zoea/*'))
for image_path in shrimp_zoea[:5]:
    display.display(Image.open(str(image_path)))
image_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255,
                                                                  validation_split=0.2)
                                                                  #rotation_range = 40,
                                                                  #width_shift_range = 0.2,
                                                                  #height_shift_range = 0.2,
                                                                  #shear_range = 0.2,
                                                                  #zoom_range = 0.2,
                                                                  #horizontal_flip = True,
                                                                  #fill_mode='nearest')
validation_split = 0.2
BATCH_SIZE = 32
BATCH_SIZE_VALID = 10
IMG_HEIGHT = 224
IMG_WIDTH = 224
STEPS_PER_EPOCH = np.ceil(image_count*(1-(validation_split))/BATCH_SIZE)
VALIDATION_STEPS = np.ceil((image_count*(validation_split)/BATCH_SIZE))
train_data_gen = image_generator.flow_from_directory(directory=str(data_dir),
                                                     subset='training',
                                                     batch_size=BATCH_SIZE,
                                                     class_mode = 'categorical',
                                                     shuffle=True,
                                                     target_size=(IMG_HEIGHT, IMG_WIDTH),
                                                     classes = list(CLASS_NAMES))

validation_data_gen = image_generator.flow_from_directory(directory=str(data_dir),
                                                     subset='validation',
                                                     batch_size=BATCH_SIZE_VALID,
                                                     class_mode = 'categorical',
                                                     shuffle=True,
                                                     target_size=(IMG_HEIGHT, IMG_WIDTH),
                                                     classes = list(CLASS_NAMES))

model_basic = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(224, 224, 3)),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(1000, activation='relu'),
    tf.keras.layers.Dense(121, activation='softmax')
])

model_basic.summary()
model_basic.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
history = model_basic.fit(
          train_data_gen,
          epochs=10,
          verbose=1,
          validation_data=validation_data_gen,
          steps_per_epoch=STEPS_PER_EPOCH,
          validation_steps=VALIDATION_STEPS,
          initial_epoch=0         
)
like image 339
ojp Avatar asked Mar 22 '20 11:03

ojp


People also ask

Why is colab pro so slow?

It's likely that Drive network rate limits are reducing the speed of your training loop.

How do I speed up my colab GPU?

Setting Up the Hardware Accelerator on Colab Colab's notebooks use CPUs by default — to change the runtime type to GPUs or TPUs, select “Change runtime type” under “Runtime” from Colab's menu bar. The hardware settings can be accessed from “Change runtime type” under “Runtime” in Colab's menu bar.

Does colab Pro have GPU limit?

Colab Pro and Pro+ limits GPU to NVIDIA P100 or T4. Colab Pro limits RAM to 32 GB while Pro+ limits RAM to 52 GB. Colab Pro and Pro+ limit sessions to 24 hours.

Is Google colab faster than GPU?

Cloud GPUs are only good for computing. That's something you should think about. To summarize, even a mid-range GPU dramatically outperforms the free Google Colab environment. Keep in mind that I was assigned with Tesla K80 12 GB, which might not be the case for you.


Video Answer


1 Answers

In the end the bottle neck seems to be loading images from google drive to colab in each batch. Loading the images to disk reduced the time per epoch to about 30 seconds... here is the code I used to load to disk:

!mkdir train_local
!unzip train.zip -d train_local

After uploading my train.zip file to colab

like image 104
ojp Avatar answered Nov 15 '22 11:11

ojp