I am trying to grab all the files created under a parent directory. The parent directory has a lot of sub directories followed by files in those directories.
parent
--- sub folder1
--- file1
--- file2
Currently I am grabbing all the ids of sub folders and constructing a query such as q: 'subfolder1id' in parents or 'subfolder2id' in parents to find the list of files. Then I issue these in batches. If I have 100 folders, I issue 10 search queries for a batch size of 10.
Is there a better way of querying the files using google drive rest api that will get me all the files with one query?
The Google Drive API allows you to create apps that leverage Google Drive cloud storage. You can develop applications that integrate with Drive, and create robust functionality in your application using the Drive API.
Here is an answer to your question.
Same idea from your scenario:
folderA____ folderA1____folderA1a \____folderA2____folderA2a \___folderA2b
There 3 alternative answers that I think you can get an idea from.
Alternative 1. Recursion
The temptation would be to list the children of folderA, for any children that are folders, recursively list their children, rinse, repeat. In a very small number of cases, this might be the best approach, but for most, it has the following problems:-
- It is woefully time consuming to do a server round trip for each sub folder. This does of course depend on the size of your tree, so if you can guarantee that your tree size is small, it could be OK.
Alternative 2. The common parent
This works best if all of the files are being created by your app (ie. you are using drive.file scope). As well as the folder hierarchy above, create a dummy parent folder called say "MyAppCommonParent". As you create each file as a child of its particular Folder, you also make it a child of MyAppCommonParent. This becomes a lot more intuitive if you remember to think of Folders as labels. You can now easily retrieve all descdendants by simply querying
MyAppCommonParent in parents
.Alternative 3. Folders first
Start by getting all folders. Yep, all of them. Once you have them all in memory, you can crawl through their parents properties and build your tree structure and list of Folder IDs. You can then do a single
files.list?q='folderA' in parents or 'folderA1' in parents or 'folderA1a' in parents....
Using this technique you can get everything in two http calls.Alternative 2 is the most effificient, but only works if you have control of file creation. Alternative 3 is generally more efficient than Alternative 1, but there may be certain small tree sizes where 1 is best.
scope = ['https://www.googleapis.com/auth/drive']
credentials = ServiceAccountCredentials.from_json_keyfile_name('your JSON credentials' % path, scope)
service = build('drive', 'v3', credentials=credentials)
folder_tree = "NAME OF THE FOLDER YOU WANT TO START YOUR SEARCH"
folder_ids = {}
folder_ids['NAME OF THE FOLDER YOU WANT TO START YOUR SEARCH'] = folder_id
def check_for_subfolders(folder_id):
new_sub_patterns = {}
folders = service.files().list(q="mimeType='application/vnd.google-apps.folder' and parents in '"+folder_id+"' and trashed = false",fields="nextPageToken, files(id, name)",pageSize=400).execute()
all_folders = folders.get('files', [])
all_files = check_for_files(folder_id)
n_files = len(all_files)
n_folders = len(all_folders)
old_folder_tree = folder_tree
if n_folders != 0:
for i,folder in enumerate(all_folders):
folder_name = folder['name']
subfolder_pattern = old_folder_tree + '/'+ folder_name
new_pattern = subfolder_pattern
new_sub_patterns[subfolder_pattern] = folder['id']
print('New Pattern:', new_pattern)
all_files = check_for_files(folder['id'])
n_files =len(all_files)
new_folder_tree = new_pattern
if n_files != 0:
for file in all_files:
file_name = file['name']
new_file_tree_pattern = subfolder_pattern + "/" + file_name
new_sub_patterns[new_file_tree_pattern] = file['id']
print("Files added :", file_name)
else:
print('No Files Found')
else:
all_files = check_for_files(folder_id)
n_files = len(all_files)
if n_files != 0:
for file in all_files:
file_name = file['name']
subfolders[folder_tree + '/'+file_name] = file['id']
new_file_tree_pattern = subfolder_pattern + "/" + file_name
new_sub_patterns[new_file_tree_pattern] = file['id']
print("Files added :", file_name)
return new_sub_patterns
def check_for_files(folder_id):
other_files = service.files().list(q="mimeType!='application/vnd.google-apps.folder' and parents in '"+folder_id+"' and trashed = false",fields="nextPageToken, files(id, name)",pageSize=400).execute()
all_other_files = other_files.get('files', [])
return all_other_files
def get_folder_tree(folder_id):
global folder_tree
sub_folders = check_for_subfolders(folder_id)
for i,sub_folder_id in enumerate(sub_folders.values()):
folder_tree = list(sub_folders.keys() )[i]
print('Current Folder Tree : ', folder_tree)
folder_ids.update(sub_folders)
print('****************************************Recursive Search Begins**********************************************')
try:
get_folder_tree(sub_folder_id)
except:
print('---------------------------------No furtherance----------------------------------------------')
return folder_ids
folder_ids = get_folder_tree(folder_id)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With