I would like to immediately delete temporary files saved from a Google Colaboratory notebook without them going to the Trash.
I am using Keras+Tensorflow in my script and have it save the complete model after every epoch of training. The main reason is that if the script is stopped for any reason, I can restart it later and it will read in the most recently saved model and continue training. In order to save disk space (it is using my Google Drive) I have it delete the previous version of the model every time it saves a new one. I did this with the standard python os.remove() only to find out later that I completely filled my Google Drive due to os.remove just moving the files to the Trash folder and not actually deleting them.
I looked around and found references to the google colab API that said you have to call the Delete method of the file object. However, getting a reference to the file object with just a file name seems ridiculously complicated. I assume I am not doing it correctly. The code below is the work-around I came up with. There is a comment that marks where I had to replace my one-liner with 25 lines of much less readable code.
I should also say that the documentation I found kept indicating that I should be able to find the file in basically one call to gdrive.ListFile using something like "name='myfile'" but whenever I tried that, I kept getting http inquiry errors.
!pip install -U -q PyDrive
import os
from google.colab import drive
drive.mount('/content/gdrive')
workdir = '/content/gdrive/My Drive/work/2019.03.26.trackingML/eff100_inverted'
os.chdir( workdir )
epoch = 170
fname = 'model_checkpoints/model_epoch%03d.h5' % (epoch)
#--------------------------------------------------------
# Everything below here is to replace the one line:
# os.remove(fname)
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
gdrive = GoogleDrive(gauth)
# File google colab file object based on path
fullpath = os.path.join(workdir, fname)
mydirs = fullpath.split('/')[3:]
curid = 'root'
for d in mydirs:
file_list = gdrive.ListFile({'q': "'%s' in parents and trashed=false" % curid}).GetList()
for file in file_list:
if file['title'] == d:
curid = file['id']
break
if fname.endswith(file['title']):
print('Found file %s with id %s' % (file['title'], file['id']))
file.Delete()
else:
print('Unable to find %s' % fname)
The above code pretty much does what I want, but seems ugly and bloated. I'm hoping someone can point me to the 1 or 2 line replacement for os.remove() that avoids filling my Trash (and quota).
Suppose that your checkpoint file name is starting with "model_epoch"
1) In colab, write these statements in a cell at beginning:
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
2) Go to Drive an right click on folder which contains checkpoint files and select Get shareable link. An id will be copied.
3) In colab, write this function in a cell. def clearCheckPointFiles():
file_list = drive.ListFile({'q': "'*******************' in parents and trashed=false"}).GetList()
for i in range(np.size(file_list)):
file_name = file_list[i]['title']
if (file_name[0:11] == 'model_epoch'):
drive.CreateFile({'id': file_list[i]['id']}).Delete()
4) Replace ***** with the id of copied link in step 2.
5) call clearCheckPointFiles() just before saving new checkpoint.
6) Enjoy!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With