This is a commonly asked question.
The scenario is:-
folderA____ folderA1____folderA1a \____folderA2____folderA2a \___folderA2b
... and the question is how do I list all the files in all of the folders under the root folderA
.
Tap or click to open File Explorer. Search looks in all folders and subfolders within the library or folder you're viewing. When you tap or click inside the search box, the Search Tools tab appears. As you enter your search terms, your results will start to appear with your search terms highlighted.
Select the drop-down in the search bar and choose the folder you want from the Location menu, or right-click on a folder and search within that folder. Only folders that are within your My Drive or within Team Drives can be searched; if you have a folder that is shared with you, first add the folder to your My Drive.
A subfolder is a folder stored inside another folder. Subfolders help you organize your files more completely. Each subfolder should be used to store files related to each other. For example, you might have one folder for files related to a job search.
EDIT: April 2020 Google have announced that multi-parent files is being disabled from September 2020. This alters the narrative below and means option 2 is no longer an option. It might be possible to implement Option 2 using shortcuts. I will update this answer further as I test the new restrictions/features We are all used to the idea of folders (aka directories) in Windows/nix etc. In the real world, a folder is a container, into which documents are placed. It is also possible to place smaller folders inside bigger folders. Thus the big folder can be thought of as containing all of the documents inside its smaller children folders.
However, in Google Drive, a Folder is NOT a container, so much so that in the first release of Google Drive, they weren't even called Folders, they were called Collections. A Folder is simply a File with (a) no contents, and (b) a special mime-type (application/vnd.google-apps.folder). The way Folders are used is exactly the same way that tags (aka labels) are used. The best way to understand this is to consider GMail. If you look at the top of an open mail item, you see two icons. A folder with the tooltip "Move to" and a label with the tooltip "Labels". Click on either of these and the same dialogue box appears and is all about labels. Your labels are listed down the left hand side, in a tree display that looks a lot like folders. Importantly, a mail item can have multiple labels, or you could say, a mail item can be in multiple folders. Google Drive's Folders work in exactly the same way that GMail labels work.
Having established that a Folder is simply a label, there is nothing stopping you from organising your labels in a hierarchy that resembles a folder tree, in fact this is the most common way of doing so.
It should now be clear that a file (let's call it MyFile) in folderA2b is NOT a child or grandchild of folderA. It is simply a file with a label (confusingly called a Parent) of "folderA2b". OK, so how DO I get all the files "under" folderA?
Alternative 1. Recursion
The temptation would be to list the children of folderA, for any children that are folders, recursively list their children, rinse, repeat. In a very small number of cases, this might be the best approach, but for most, it has the following problems:-
Alternative 2. The common parent
This works best if all of the files are being created by your app (ie. you are using drive.file scope). As well as the folder hierarchy above, create a dummy parent folder called say "MyAppCommonParent". As you create each file as a child of its particular Folder, you also make it a child of MyAppCommonParent. This becomes a lot more intuitive if you remember to think of Folders as labels. You can now easily retrieve all descdendants by simply querying MyAppCommonParent in parents
.
Alternative 3. Folders first
Start by getting all folders. Yep, all of them. Once you have them all in memory, you can crawl through their parents properties and build your tree structure and list of Folder IDs. You can then do a single files.list?q='folderA' in parents or 'folderA1' in parents or 'folderA1a' in parents...
. Using this technique you can get everything in two http calls.
The pseudo code for option 3 is a bit like...
// get all folders from Drive files.list?q=mimetype=application/vnd.google-apps.folder and trashed=false&fields=parents,name // store in a Map, keyed by ID // find the entry for folderA and note the ID // find any entries where the ID is in the parents, note their IDs // for each such entry, repeat recursively // use all of the IDs noted above to construct a ... // files.list?q='folderA-ID' in parents or 'folderA1-ID' in parents or 'folderA1a-ID' in parents...
Alternative 2 is the most effificient, but only works if you have control of file creation. Alternative 3 is generally more efficient than Alternative 1, but there may be certain small tree sizes where 1 is best.
Sharing a Python solution to the excellent Alternative 3 by @pinoyyid, above, in case it's useful to anyone. I'm not a developer so it's probably hopelessly un-pythonic... but it works, only makes 2 API calls, and is pretty quick.
'<folder-id>' in parents
segment per subfolder found.Interestingly, Google Drive seems to have a hard limit of 599 '<folder-id>' in parents
segments per query, so if your folder-to-search has more subfolders than this, you need to chunk the list.
FOLDER_TO_SEARCH = '123456789' # ID of folder to search DRIVE_ID = '654321' # ID of shared drive in which it lives MAX_PARENTS = 500 # Limit set safely below Google max of 599 parents per query. def get_all_folders_in_drive(): """ Return a dictionary of all the folder IDs in a drive mapped to their parent folder IDs (or to the drive itself if a top-level folder). That is, flatten the entire folder structure. """ folders_in_drive_dict = {} page_token = None max_allowed_page_size = 1000 just_folders = "trashed = false and mimeType = 'application/vnd.google-apps.folder'" while True: results = drive_api_ref.files().list( pageSize=max_allowed_page_size, fields="nextPageToken, files(id, name, mimeType, parents)", includeItemsFromAllDrives=True, supportsAllDrives=True, corpora='drive', driveId=DRIVE_ID, pageToken=page_token, q=just_folders).execute() folders = results.get('files', []) page_token = results.get('nextPageToken', None) for folder in folders: folders_in_drive_dict[folder['id']] = folder['parents'][0] if page_token is None: break return folders_in_drive_dict def get_subfolders_of_folder(folder_to_search, all_folders): """ Yield subfolders of the folder-to-search, and then subsubfolders etc. Must be called by an iterator. :param all_folders: The dictionary returned by :meth:`get_all_folders_in-drive`. """ temp_list = [k for k, v in all_folders.items() if v == folder_to_search] # Get all subfolders for sub_folder in temp_list: # For each subfolder... yield sub_folder # Return it yield from get_subfolders_of_folder(sub_folder, all_folders) # Get subsubfolders etc def get_relevant_files(self, relevant_folders): """ Get files under the folder-to-search and all its subfolders. """ relevant_files = {} chunked_relevant_folders_list = [relevant_folders[i:i + MAX_PARENTS] for i in range(0, len(relevant_folders), MAX_PARENTS)] for folder_list in chunked_relevant_folders_list: query_term = ' in parents or '.join('"{0}"'.format(f) for f in folder_list) + ' in parents' relevant_files.update(get_all_files_in_folders(query_term)) return relevant_files def get_all_files_in_folders(self, parent_folders): """ Return a dictionary of file IDs mapped to file names for the specified parent folders. """ files_under_folder_dict = {} page_token = None max_allowed_page_size = 1000 just_files = f"mimeType != 'application/vnd.google-apps.folder' and trashed = false and ({parent_folders})" while True: results = drive_api_ref.files().list( pageSize=max_allowed_page_size, fields="nextPageToken, files(id, name, mimeType, parents)", includeItemsFromAllDrives=True, supportsAllDrives=True, corpora='drive', driveId=DRIVE_ID, pageToken=page_token, q=just_files).execute() files = results.get('files', []) page_token = results.get('nextPageToken', None) for file in files: files_under_folder_dict[file['id']] = file['name'] if page_token is None: break return files_under_folder_dict if __name__ == "__main__": all_folders_dict = get_all_folders_in_drive() # Flatten folder structure relevant_folders_list = [FOLDER_TO_SEARCH] # Start with the folder-to-archive for folder in get_subfolders_of_folder(FOLDER_TO_SEARCH, all_folders_dict): relevant_folders_list.append(folder) # Recursively search for subfolders relevant_files_dict = get_relevant_files(relevant_folders_list) # Get the files
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With