Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to reconnect to the ongoing process on GoogleColab

I recently started to use Google Colab to train my CNN model. It always needs about 10+ hours to train once. But I cannot stay in the same place during these 10+ hours, so I always poweroff my notebook and let the process keep going.

My code will save models automatically. I figured out that when I disconnect from the Colab, the process are still saving models after disconnection.

Here are the questions:

  1. When I try to reconnect to the Colab notebook, it always stuck at "INITIALIZAING" stage and can't connect. I'm sure that the process is running. How do I know if the process is OVER?

  2. Is there any way to reconnect to the ongoing process? It will be nice to me to observe the training losses during the training.

Sorry for my poor English, thanks alot.

like image 935
FrankCheng Avatar asked Nov 08 '22 08:11

FrankCheng


1 Answers

Output your loss results to a log file saved in your drive, and periodically check this file.

You can run your training process like:

!log_file = "/content/drive/My Drive/path/log.log"

!python train.py > "${log_file}"
like image 164
Samah J. Zaro Avatar answered Nov 14 '22 23:11

Samah J. Zaro