Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

get list of files in a sharepoint directory using python

I have a url for sharepoint directory(intranet) and need an api to return list of files in that directory given the url. how can I do that using python?

like image 718
balakishore nadella Avatar asked May 25 '18 14:05

balakishore nadella


People also ask

Can Python interact with SharePoint?

With built-in, optimized data processing, the CData Python Connector offers unmatched performance for interacting with live SharePoint data in Python.

How do I find files in SharePoint?

The easiest way to search for documents in SharePoint Online is to use the search bar at the top of your site. By typing a phrase up here, SharePoint will show you a selection of files and folders that are related to your search query.

Can Python write to SharePoint list?

We recommend the third-party Python package "SharePlum," which provides an easy way to work with the SharePoint list and allows programmers to write clean Python code (Rollins, 2020). We create a fictitious project to demonstrate the CRUD operations on a SharePoint list.


1 Answers

Posting in case anyone else comes across this issue of getting files from a SharePoint folder from just the folder path. This link really helped me do this: https://github.com/vgrem/Office365-REST-Python-Client/issues/98. I found so much info about doing this for HTTP but not in Python so hopefully someone else needs more Python reference. I am assuming you are all setup with client_id and client_secret with the Sharepoint API. If not you can use this for reference: https://docs.microsoft.com/en-us/sharepoint/dev/solution-guidance/security-apponly-azureacs

I basically wanted to grab the names/relative urls of the files within a folder and then get the most recent file in the folder and put into a dataframe. I'm sure this isn't the "Pythonic" way to do this but it works which is good enough for me.

!pip install Office365-REST-Python-Client
from office365.runtime.auth.client_credential import ClientCredential
from office365.runtime.client_request_exception import ClientRequestException
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
import io
import datetime
import pandas as pd


sp_site = 'https://<org>.sharepoint.com/sites/<my_site>/'
relative_url = "/sites/<my_site/Shared Documents/<folder>/<sub_folder>"
client_credentials = ClientCredential(credentials['client_id'], credentials['client_secret'])
ctx = ClientContext(sp_site).with_credentials(client_credentials)
libraryRoot = ctx.web.get_folder_by_server_relative_path(relative_url)
ctx.load(libraryRoot)
ctx.execute_query()

#if you want to get the folders within <sub_folder> 
folders = libraryRoot.folders
ctx.load(folders)
ctx.execute_query()
for myfolder in folders:
    print("Folder name: {0}".format(myfolder.properties["ServerRelativeUrl"]))

#if you want to get the files in the folder        
files = libraryRoot.files
ctx.load(files)
ctx.execute_query()

#create a dataframe of the important file properties for me for each file in the folder
df_files = pd.DataFrame(columns = ['Name', 'ServerRelativeUrl', 'TimeLastModified', 'ModTime'])
for myfile in files:
    #use mod_time to get in better date format
    mod_time = datetime.datetime.strptime(myfile.properties['TimeLastModified'], '%Y-%m-%dT%H:%M:%SZ')  
    #create a dict of all of the info to add into dataframe and then append to dataframe
    dict = {'Name': myfile.properties['Name'], 'ServerRelativeUrl': myfile.properties['ServerRelativeUrl'], 'TimeLastModified': myfile.properties['TimeLastModified'], 'ModTime': mod_time}
    df_files = df_files.append(dict, ignore_index= True )

    #print statements if needed
    # print("File name: {0}".format(myfile.properties["Name"]))
    # print("File link: {0}".format(myfile.properties["ServerRelativeUrl"]))
    # print("File last modified: {0}".format(myfile.properties["TimeLastModified"]))
#get index of the most recently modified file and the ServerRelativeUrl associated with that index
newest_index = df_files['ModTime'].idxmax()
newest_file_url = df_files.iloc[newest_index]['ServerRelativeUrl']

# Get Excel File by newest_file_url identified above
response= File.open_binary(ctx, newest_file_url)
    # save data to BytesIO stream
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0)  # set file object to start
    # load Excel file from BytesIO stream
df = pd.read_excel(bytes_file_obj, sheet_name='Sheet1', header= 0)

Here is another helpful link of the file properties you can view: https://docs.microsoft.com/en-us/previous-versions/office/developer/sharepoint-rest-reference/dn450841(v=office.15). Scroll down to file properties section.

Hopefully this is helpful to someone. Again, I am not a pro and most of the time I need things to be a bit more explicit and written out. Maybe others feel that way too.

like image 100
Madison_Wells Avatar answered Oct 02 '22 07:10

Madison_Wells