I have a url for sharepoint directory(intranet) and need an api to return list of files in that directory given the url. how can I do that using python?
With built-in, optimized data processing, the CData Python Connector offers unmatched performance for interacting with live SharePoint data in Python.
The easiest way to search for documents in SharePoint Online is to use the search bar at the top of your site. By typing a phrase up here, SharePoint will show you a selection of files and folders that are related to your search query.
We recommend the third-party Python package "SharePlum," which provides an easy way to work with the SharePoint list and allows programmers to write clean Python code (Rollins, 2020). We create a fictitious project to demonstrate the CRUD operations on a SharePoint list.
Posting in case anyone else comes across this issue of getting files from a SharePoint folder from just the folder path. This link really helped me do this: https://github.com/vgrem/Office365-REST-Python-Client/issues/98. I found so much info about doing this for HTTP but not in Python so hopefully someone else needs more Python reference. I am assuming you are all setup with client_id and client_secret with the Sharepoint API. If not you can use this for reference: https://docs.microsoft.com/en-us/sharepoint/dev/solution-guidance/security-apponly-azureacs
I basically wanted to grab the names/relative urls of the files within a folder and then get the most recent file in the folder and put into a dataframe. I'm sure this isn't the "Pythonic" way to do this but it works which is good enough for me.
!pip install Office365-REST-Python-Client
from office365.runtime.auth.client_credential import ClientCredential
from office365.runtime.client_request_exception import ClientRequestException
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
import io
import datetime
import pandas as pd
sp_site = 'https://<org>.sharepoint.com/sites/<my_site>/'
relative_url = "/sites/<my_site/Shared Documents/<folder>/<sub_folder>"
client_credentials = ClientCredential(credentials['client_id'], credentials['client_secret'])
ctx = ClientContext(sp_site).with_credentials(client_credentials)
libraryRoot = ctx.web.get_folder_by_server_relative_path(relative_url)
ctx.load(libraryRoot)
ctx.execute_query()
#if you want to get the folders within <sub_folder>
folders = libraryRoot.folders
ctx.load(folders)
ctx.execute_query()
for myfolder in folders:
print("Folder name: {0}".format(myfolder.properties["ServerRelativeUrl"]))
#if you want to get the files in the folder
files = libraryRoot.files
ctx.load(files)
ctx.execute_query()
#create a dataframe of the important file properties for me for each file in the folder
df_files = pd.DataFrame(columns = ['Name', 'ServerRelativeUrl', 'TimeLastModified', 'ModTime'])
for myfile in files:
#use mod_time to get in better date format
mod_time = datetime.datetime.strptime(myfile.properties['TimeLastModified'], '%Y-%m-%dT%H:%M:%SZ')
#create a dict of all of the info to add into dataframe and then append to dataframe
dict = {'Name': myfile.properties['Name'], 'ServerRelativeUrl': myfile.properties['ServerRelativeUrl'], 'TimeLastModified': myfile.properties['TimeLastModified'], 'ModTime': mod_time}
df_files = df_files.append(dict, ignore_index= True )
#print statements if needed
# print("File name: {0}".format(myfile.properties["Name"]))
# print("File link: {0}".format(myfile.properties["ServerRelativeUrl"]))
# print("File last modified: {0}".format(myfile.properties["TimeLastModified"]))
#get index of the most recently modified file and the ServerRelativeUrl associated with that index
newest_index = df_files['ModTime'].idxmax()
newest_file_url = df_files.iloc[newest_index]['ServerRelativeUrl']
# Get Excel File by newest_file_url identified above
response= File.open_binary(ctx, newest_file_url)
# save data to BytesIO stream
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0) # set file object to start
# load Excel file from BytesIO stream
df = pd.read_excel(bytes_file_obj, sheet_name='Sheet1', header= 0)
Here is another helpful link of the file properties you can view: https://docs.microsoft.com/en-us/previous-versions/office/developer/sharepoint-rest-reference/dn450841(v=office.15). Scroll down to file properties section.
Hopefully this is helpful to someone. Again, I am not a pro and most of the time I need things to be a bit more explicit and written out. Maybe others feel that way too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With