Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I search sub-folders and sub-sub-folders in Google Drive?

This is a commonly asked question.

The scenario is:-

folderA____ folderA1____folderA1a        \____folderA2____folderA2a                     \___folderA2b 

... and the question is how do I list all the files in all of the folders under the root folderA.

like image 334
pinoyyid Avatar asked Jan 19 '17 12:01

pinoyyid


People also ask

How do I search all folders and subfolders?

Tap or click to open File Explorer. Search looks in all folders and subfolders within the library or folder you're viewing. When you tap or click inside the search box, the Search Tools tab appears. As you enter your search terms, your results will start to appear with your search terms highlighted.

How do I find subfolders in Google Drive?

Select the drop-down in the search bar and choose the folder you want from the Location menu, or right-click on a folder and search within that folder. Only folders that are within your My Drive or within Team Drives can be searched; if you have a folder that is shared with you, first add the folder to your My Drive.

What is the difference between folders and sub folders?

A subfolder is a folder stored inside another folder. Subfolders help you organize your files more completely. Each subfolder should be used to store files related to each other. For example, you might have one folder for files related to a job search.


2 Answers

EDIT: April 2020 Google have announced that multi-parent files is being disabled from September 2020. This alters the narrative below and means option 2 is no longer an option. It might be possible to implement Option 2 using shortcuts. I will update this answer further as I test the new restrictions/features We are all used to the idea of folders (aka directories) in Windows/nix etc. In the real world, a folder is a container, into which documents are placed. It is also possible to place smaller folders inside bigger folders. Thus the big folder can be thought of as containing all of the documents inside its smaller children folders.

However, in Google Drive, a Folder is NOT a container, so much so that in the first release of Google Drive, they weren't even called Folders, they were called Collections. A Folder is simply a File with (a) no contents, and (b) a special mime-type (application/vnd.google-apps.folder). The way Folders are used is exactly the same way that tags (aka labels) are used. The best way to understand this is to consider GMail. If you look at the top of an open mail item, you see two icons. A folder with the tooltip "Move to" and a label with the tooltip "Labels". Click on either of these and the same dialogue box appears and is all about labels. Your labels are listed down the left hand side, in a tree display that looks a lot like folders. Importantly, a mail item can have multiple labels, or you could say, a mail item can be in multiple folders. Google Drive's Folders work in exactly the same way that GMail labels work.

Having established that a Folder is simply a label, there is nothing stopping you from organising your labels in a hierarchy that resembles a folder tree, in fact this is the most common way of doing so.

It should now be clear that a file (let's call it MyFile) in folderA2b is NOT a child or grandchild of folderA. It is simply a file with a label (confusingly called a Parent) of "folderA2b". OK, so how DO I get all the files "under" folderA?

Alternative 1. Recursion

The temptation would be to list the children of folderA, for any children that are folders, recursively list their children, rinse, repeat. In a very small number of cases, this might be the best approach, but for most, it has the following problems:-

  • It is woefully time consuming to do a server round trip for each sub folder. This does of course depend on the size of your tree, so if you can guarantee that your tree size is small, it could be OK.

Alternative 2. The common parent

This works best if all of the files are being created by your app (ie. you are using drive.file scope). As well as the folder hierarchy above, create a dummy parent folder called say "MyAppCommonParent". As you create each file as a child of its particular Folder, you also make it a child of MyAppCommonParent. This becomes a lot more intuitive if you remember to think of Folders as labels. You can now easily retrieve all descdendants by simply querying MyAppCommonParent in parents.

Alternative 3. Folders first

Start by getting all folders. Yep, all of them. Once you have them all in memory, you can crawl through their parents properties and build your tree structure and list of Folder IDs. You can then do a single files.list?q='folderA' in parents or 'folderA1' in parents or 'folderA1a' in parents.... Using this technique you can get everything in two http calls.

The pseudo code for option 3 is a bit like...

// get all folders from Drive files.list?q=mimetype=application/vnd.google-apps.folder and trashed=false&fields=parents,name // store in a Map, keyed by ID // find the entry for folderA and note the ID // find any entries where the ID is in the parents, note their IDs // for each such entry, repeat recursively // use all of the IDs noted above to construct a ... // files.list?q='folderA-ID' in parents or 'folderA1-ID' in parents or 'folderA1a-ID' in parents...

Alternative 2 is the most effificient, but only works if you have control of file creation. Alternative 3 is generally more efficient than Alternative 1, but there may be certain small tree sizes where 1 is best.

like image 70
pinoyyid Avatar answered Sep 17 '22 11:09

pinoyyid


Sharing a Python solution to the excellent Alternative 3 by @pinoyyid, above, in case it's useful to anyone. I'm not a developer so it's probably hopelessly un-pythonic... but it works, only makes 2 API calls, and is pretty quick.

  1. Get a master list of all the folders in a drive.
  2. Test whether the folder-to-search is a parent (ie. it has subfolders).
  3. Iterate through subfolders of the folder-to-search testing whether they too are parents.
  4. Build a Google Drive file query with one '<folder-id>' in parents segment per subfolder found.

Interestingly, Google Drive seems to have a hard limit of 599 '<folder-id>' in parents segments per query, so if your folder-to-search has more subfolders than this, you need to chunk the list.

FOLDER_TO_SEARCH = '123456789'  # ID of folder to search DRIVE_ID = '654321'  # ID of shared drive in which it lives MAX_PARENTS = 500  # Limit set safely below Google max of 599 parents per query.   def get_all_folders_in_drive():     """     Return a dictionary of all the folder IDs in a drive mapped to their parent folder IDs (or to the     drive itself if a top-level folder). That is, flatten the entire folder structure.     """     folders_in_drive_dict = {}     page_token = None     max_allowed_page_size = 1000     just_folders = "trashed = false and mimeType = 'application/vnd.google-apps.folder'"     while True:         results = drive_api_ref.files().list(             pageSize=max_allowed_page_size,             fields="nextPageToken, files(id, name, mimeType, parents)",             includeItemsFromAllDrives=True, supportsAllDrives=True,             corpora='drive',             driveId=DRIVE_ID,             pageToken=page_token,             q=just_folders).execute()         folders = results.get('files', [])         page_token = results.get('nextPageToken', None)         for folder in folders:             folders_in_drive_dict[folder['id']] = folder['parents'][0]         if page_token is None:             break     return folders_in_drive_dict   def get_subfolders_of_folder(folder_to_search, all_folders):     """     Yield subfolders of the folder-to-search, and then subsubfolders etc. Must be called by an iterator.     :param all_folders: The dictionary returned by :meth:`get_all_folders_in-drive`.     """     temp_list = [k for k, v in all_folders.items() if v == folder_to_search]  # Get all subfolders     for sub_folder in temp_list:  # For each subfolder...         yield sub_folder  # Return it         yield from get_subfolders_of_folder(sub_folder, all_folders)  # Get subsubfolders etc   def get_relevant_files(self, relevant_folders):     """     Get files under the folder-to-search and all its subfolders.     """     relevant_files = {}     chunked_relevant_folders_list = [relevant_folders[i:i + MAX_PARENTS] for i in                                      range(0, len(relevant_folders), MAX_PARENTS)]     for folder_list in chunked_relevant_folders_list:         query_term = ' in parents or '.join('"{0}"'.format(f) for f in folder_list) + ' in parents'         relevant_files.update(get_all_files_in_folders(query_term))     return relevant_files   def get_all_files_in_folders(self, parent_folders):     """     Return a dictionary of file IDs mapped to file names for the specified parent folders.     """     files_under_folder_dict = {}     page_token = None     max_allowed_page_size = 1000     just_files = f"mimeType != 'application/vnd.google-apps.folder' and trashed = false and ({parent_folders})"     while True:         results = drive_api_ref.files().list(             pageSize=max_allowed_page_size,             fields="nextPageToken, files(id, name, mimeType, parents)",             includeItemsFromAllDrives=True, supportsAllDrives=True,             corpora='drive',             driveId=DRIVE_ID,             pageToken=page_token,             q=just_files).execute()         files = results.get('files', [])         page_token = results.get('nextPageToken', None)         for file in files:             files_under_folder_dict[file['id']] = file['name']         if page_token is None:             break     return files_under_folder_dict   if __name__ == "__main__":     all_folders_dict = get_all_folders_in_drive()  # Flatten folder structure     relevant_folders_list = [FOLDER_TO_SEARCH]  # Start with the folder-to-archive     for folder in get_subfolders_of_folder(FOLDER_TO_SEARCH, all_folders_dict):         relevant_folders_list.append(folder)  # Recursively search for subfolders     relevant_files_dict = get_relevant_files(relevant_folders_list)  # Get the files 
like image 22
James Leedham Avatar answered Sep 21 '22 11:09

James Leedham