I'm building a python application that uses the Google drive APIs, so fare the development is good but I have a problem to retrieve the entire Google drive file tree, I need that for two purposes:
For now I have a function that fetch the root of Gdrive and I can build the three by recursive calling a function that list me the content of a single folder, but it is extremely slow and can potentially make thousand of request to google and this is unacceptable.
Here the function to get the root:
def drive_get_root():
"""Retrieve a root list of File resources.
Returns:
List of dictionaries.
"""
#build the service, the driveHelper module will take care of authentication and credential storage
drive_service = build('drive', 'v2', driveHelper.buildHttp())
# the result will be a list
result = []
page_token = None
while True:
try:
param = {}
if page_token:
param['pageToken'] = page_token
files = drive_service.files().list(**param).execute()
#add the files in the list
result.extend(files['items'])
page_token = files.get('nextPageToken')
if not page_token:
break
except errors.HttpError, _error:
print 'An error occurred: %s' % _error
break
return result
and here the one to get the file from a folder
def drive_files_in_folder(folder_id):
"""Print files belonging to a folder.
Args:
folder_id: ID of the folder to get files from.
"""
#build the service, the driveHelper module will take care of authentication and credential storage
drive_service = build('drive', 'v2', driveHelper.buildHttp())
# the result will be a list
result = []
#code from google, is working so I didn't touch it
page_token = None
while True:
try:
param = {}
if page_token:
param['pageToken'] = page_token
children = drive_service.children().list(folderId=folder_id, **param).execute()
for child in children.get('items', []):
result.append(drive_get_file(child['id']))
page_token = children.get('nextPageToken')
if not page_token:
break
except errors.HttpError, _error:
print 'An error occurred: %s' % _error
break
return result
and for example now to check if a file exist I'm using this:
def drive_path_exist(file_path, list = False):
"""
This is a recursive function to che check if the given path exist
"""
#if the list param is empty set the list as the root of Gdrive
if list == False:
list = drive_get_root()
#split the string to get the first item and check if is in the root
file_path = string.split(file_path, "/")
#if there is only one element in the filepath we are at the actual filename
#so if is in this folder we can return it
if len(file_path) == 1:
exist = False
for elem in list:
if elem["title"] == file_path[0]:
#set exist = to the elem because the elem is a dictionary with all the file info
exist = elem
return exist
#if we are not at the last element we have to keep searching
else:
exist = False
for elem in list:
#check if the current item is in the folder
if elem["title"] == file_path[0]:
exist = True
folder_id = elem["id"]
#delete the first element and keep searching
file_path.pop(0)
if exist:
#recursive call, we have to rejoin the filpath as string an passing as list the list
#from the drive_file_exist function
return drive_path_exist("/".join(file_path), drive_files_in_folder(folder_id))
any idea how to solve my problem? I saw a few discussion here on overflow and in some answers people wrote that this is possible but of course the didn't said how!
Thanks
In order to build a representation of a tree in your app, you need to do this ...
If you simply want to check if file-A exists in folder-B, the approach depends on whether the name "folder-B" is guaranteed to be unique.
If it's unique, just do a FilesList query for title='file-A', then do a Files Get for each of its parents and see if any of them are called 'folder-B'.
You don't say if these files and folders are being created by your app, or by the user with the Google Drive Webapp. If your app is the creator of these files/folders there is a trick you can use to restrict your searches to a single root. Say you have
MyDrive/app_root/folder-C/folder-B/file-A
you can make all of folder-C, folder-B and file-A children of app_root
That way you can constrain all of your queries to include
and 'app_root_id' in parents
NB. A previous version of this answer highlighted that Drive folders were not constrained to an inverted tree hierarchy, because a single folder could have multiple parents. As of 2021, this is no longer true and a Drive File (including Folders, which are simply special files) can only be created with a single parent.
An easy way to check if a file exist in a specific path is: drive_service.files().list(q="'THE_ID_OF_SPECIFIC_PATH' in parents and title='a file'").execute()
To walk all folders and files:
import sys, os
import socket
import googleDriveAccess
import logging
logging.basicConfig()
FOLDER_TYPE = 'application/vnd.google-apps.folder'
def getlist(ds, q, **kwargs):
result = None
npt = ''
while not npt is None:
if npt != '': kwargs['pageToken'] = npt
entries = ds.files().list(q=q, **kwargs).execute()
if result is None: result = entries
else: result['items'] += entries['items']
npt = entries.get('nextPageToken')
return result
def uenc(u):
if isinstance(u, unicode): return u.encode('utf-8')
else: return u
def walk(ds, folderId, folderName, outf, depth):
spc = ' ' * depth
outf.write('%s+%s\n%s %s\n' % (spc, uenc(folderId), spc, uenc(folderName)))
q = "'%s' in parents and mimeType='%s'" % (folderId, FOLDER_TYPE)
entries = getlist(ds, q, **{'maxResults': 200})
for folder in entries['items']:
walk(ds, folder['id'], folder['title'], outf, depth + 1)
q = "'%s' in parents and mimeType!='%s'" % (folderId, FOLDER_TYPE)
entries = getlist(ds, q, **{'maxResults': 200})
for f in entries['items']:
outf.write('%s -%s\n%s %s\n' % (spc, uenc(f['id']), spc, uenc(f['title'])))
def main(basedir):
da = googleDriveAccess.DAClient(basedir) # clientId=None, script=False
f = open(os.path.join(basedir, 'hierarchy.txt'), 'wb')
walk(da.drive_service, 'root', u'root', f, 0)
f.close()
if __name__ == '__main__':
logging.getLogger().setLevel(getattr(logging, 'INFO'))
try:
main(os.path.dirname(__file__))
except (socket.gaierror, ), e:
sys.stderr.write('socket.gaierror')
using googleDriveAccess github.com/HatsuneMiku/googleDriveAccess
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With