Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Search files recursively using google drive rest

I am trying to grab all the files created under a parent directory. The parent directory has a lot of sub directories followed by files in those directories.

parent
--- sub folder1
    --- file1
    --- file2

Currently I am grabbing all the ids of sub folders and constructing a query such as q: 'subfolder1id' in parents or 'subfolder2id' in parents to find the list of files. Then I issue these in batches. If I have 100 folders, I issue 10 search queries for a batch size of 10.

Is there a better way of querying the files using google drive rest api that will get me all the files with one query?

like image 913
work monitored Avatar asked Oct 03 '17 13:10

work monitored


People also ask

Does Google Drive have an API?

The Google Drive API allows you to create apps that leverage Google Drive cloud storage. You can develop applications that integrate with Drive, and create robust functionality in your application using the Drive API.


2 Answers

Here is an answer to your question.

Same idea from your scenario:

folderA____ folderA1____folderA1a
       \____folderA2____folderA2a
                    \___folderA2b

There 3 alternative answers that I think you can get an idea from.

Alternative 1. Recursion

The temptation would be to list the children of folderA, for any children that are folders, recursively list their children, rinse, repeat. In a very small number of cases, this might be the best approach, but for most, it has the following problems:-

  • It is woefully time consuming to do a server round trip for each sub folder. This does of course depend on the size of your tree, so if you can guarantee that your tree size is small, it could be OK.

Alternative 2. The common parent

This works best if all of the files are being created by your app (ie. you are using drive.file scope). As well as the folder hierarchy above, create a dummy parent folder called say "MyAppCommonParent". As you create each file as a child of its particular Folder, you also make it a child of MyAppCommonParent. This becomes a lot more intuitive if you remember to think of Folders as labels. You can now easily retrieve all descdendants by simply querying MyAppCommonParent in parents.

Alternative 3. Folders first

Start by getting all folders. Yep, all of them. Once you have them all in memory, you can crawl through their parents properties and build your tree structure and list of Folder IDs. You can then do a single files.list?q='folderA' in parents or 'folderA1' in parents or 'folderA1a' in parents.... Using this technique you can get everything in two http calls.

Alternative 2 is the most effificient, but only works if you have control of file creation. Alternative 3 is generally more efficient than Alternative 1, but there may be certain small tree sizes where 1 is best.

like image 178
MαπμQμαπkγVπ.0 Avatar answered Sep 22 '22 13:09

MαπμQμαπkγVπ.0


scope = ['https://www.googleapis.com/auth/drive']

credentials = ServiceAccountCredentials.from_json_keyfile_name('your JSON credentials' % path, scope)

service = build('drive', 'v3', credentials=credentials)

folder_tree = "NAME OF THE FOLDER YOU WANT TO START YOUR SEARCH"
folder_ids = {}
folder_ids['NAME OF THE FOLDER YOU WANT TO START YOUR SEARCH'] = folder_id

def check_for_subfolders(folder_id):
    new_sub_patterns = {}
    folders = service.files().list(q="mimeType='application/vnd.google-apps.folder' and parents in '"+folder_id+"' and trashed = false",fields="nextPageToken, files(id, name)",pageSize=400).execute()
    all_folders = folders.get('files', [])
    all_files = check_for_files(folder_id)
    n_files = len(all_files)
    n_folders = len(all_folders)
    old_folder_tree = folder_tree
    if n_folders != 0:
        for i,folder in enumerate(all_folders):
            folder_name =  folder['name']
            subfolder_pattern = old_folder_tree + '/'+ folder_name
            new_pattern = subfolder_pattern
            new_sub_patterns[subfolder_pattern] = folder['id']
            print('New Pattern:', new_pattern)
            all_files = check_for_files(folder['id'])
            n_files =len(all_files)
            new_folder_tree = new_pattern 
            if n_files != 0:
                for file in all_files:
                    file_name = file['name']
                    new_file_tree_pattern = subfolder_pattern + "/" + file_name
                    new_sub_patterns[new_file_tree_pattern] = file['id']
                    print("Files added :", file_name)
            else:
                print('No Files Found')
    else:
        all_files = check_for_files(folder_id)
        n_files = len(all_files)
        if n_files != 0:
            for file in all_files:
                file_name = file['name']
                subfolders[folder_tree + '/'+file_name] = file['id']
                new_file_tree_pattern = subfolder_pattern + "/" + file_name
                new_sub_patterns[new_file_tree_pattern] = file['id']
                print("Files added :", file_name)
    return new_sub_patterns 

def check_for_files(folder_id):
    other_files = service.files().list(q="mimeType!='application/vnd.google-apps.folder' and parents in '"+folder_id+"' and trashed = false",fields="nextPageToken, files(id, name)",pageSize=400).execute()
    all_other_files = other_files.get('files', [])   
    return all_other_files
def get_folder_tree(folder_id):
    global folder_tree
    sub_folders = check_for_subfolders(folder_id)

    for i,sub_folder_id in enumerate(sub_folders.values()):
        folder_tree = list(sub_folders.keys() )[i]
        print('Current Folder Tree : ', folder_tree)
        folder_ids.update(sub_folders)
        print('****************************************Recursive Search Begins**********************************************')
        try:
            get_folder_tree(sub_folder_id)
        except:
            print('---------------------------------No furtherance----------------------------------------------')
    return folder_ids 

folder_ids = get_folder_tree(folder_id)
like image 43
tomdxb0004 Avatar answered Sep 24 '22 13:09

tomdxb0004