Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identify external workbook links using openpyxl

I am trying to identify all cells that contain external workbook references, using openpyxl in Python 3.4. But I am failing. My first try consisted of:

def find_external_value(cell):
# identifies an external link in a given cell

    if '.xls' in cell.value:
        has_external_reference = True

    return has_external_value

However, when I print the cell values that have external values to the console, it yields this:

=[1]Sheet1!$B$4
=[2]Sheet1!$B$4

So, openpyxl obviously does not parse formulas containing external values in the way I imagined and since square brackets are used for table formulas, there is no sense in trying to pick up on external links in this manner.

I dug a little deeper and found the detect_external_links function in the openpyxl.workbook.names.external module (reference). I have no idea if one can actually call this function to do what I want.

From the console results it seems as if openpyxl understands that there are references, and seems to contain them in a list of sorts. But can one access this list? Or detect if such a list exists?

Whichever way - all I need is to figure out if a cell contains a link to an external workbook.

like image 744
artifex_knowledge Avatar asked Dec 04 '15 19:12

artifex_knowledge


People also ask

How to create an internal link in Excel using openpyxl?

Setting the style attribute to Hyperlink has styled the cell in a way that it appears like a link. Assuming you have an Excel file named hyperlink_example.xlsx with two sheets named Sheet1 and Sheet2. You want to create an internal link from cell (A1) of Sheet1 to another cell (A1) of Sheet2 using Openpyxl. Excel uses the # for same file links.

What is openpyxl?

Last Updated : 08 Jun, 2021 Openpyxl is a Python library for reading and writing Excel (with extension xlsx/xlsm/xltx/xltm) files. The openpyxl module allows Python program to read and modify Excel files.

How do I find external references in a workbook?

Linking to other workbooks is a very common task in Excel, but sometimes you might find yourself with a workbook that has links you can’t find even though Excel tells you they exist. There is no automatic way to find all external references that are used in a workbook, however, there are several manual methods you can use to find them.

How to add a hyperlink to a cell in Excel with Python?

There are multiple ways to add a hyperlink to a certain cell in Excel with Python. You can directly use the HYPERLINK built-in function in Excel. ws.cell (row=1, column=1).value = '=HYPERLINK (" {}", " {}")'.format (link, "Link Name") link - The url link to point Link Name - The string to display


2 Answers

I have found a solution to this. Use the openpyxl library for load the xlsx file as

import openpyxl
wb=openpyxl.load_workbook("Myworkbook.xlsx")

"""len(wb._external_links)        *Add this line to get count of linked workbooks*"""

items=wb._external_links
for index, item in enumerate(items):
    Mystr =wb._external_links[index].file_link.Target
    Mystr=Mystr.replace("file:///","")
    print(Mystr.replace("%20"," "))


----------------------------
Out[01]: ##Indicates that the workbook has 4 external workbook links##
/Users/myohannan/AppData/Local/Temp/49/orion/Extension Workpapers_Learning Extension Calc W_83180610.xlsx
/Users/lmmeyer/AppData/Local/Temp/orion/Complete Set of Workpapers_PPS Workpapers 123112_111698213.xlsx
\\SF-DATA-2\IBData\TEMP\ie5\Temporary Internet Files\OLK8A\LBO Models\PIGLET Current.xls
/WINNT/Temporary Internet Files/OLK3/WINDOWS/Temporary Internet Files/OLK8304/DEZ.XLS     
like image 88
Ankur Chakravarty Avatar answered Sep 28 '22 16:09

Ankur Chakravarty


I decided to veer outside of openpyxl in order to achieve my goal - even though openpyxl has numerous functions that refer to external links I was unable to find a simple way to achieve my goal.

Instead I decided to use ZipFile to open the workbook in memory, then search for the externalLink1.xml file. If it exists, then the workbook contains external links:

import tkinter as tk
from tkinter import filedialog
from zipfile import ZipFile
Import xml.etree.ElementTree

root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilename()

with ZipFile(file_path) as myzip:
    try:
        my_file = myzip.open('xl/externalLinks/externalLink1.xml')
        e = xml.etree.ElementTree.parse(my_file).getroot()
        print('Has external references')
    except:
        print('No external references')

Once I have the XML file, I can proceed to identify the cell address, value and other information by running through the XML tree using ElementTree.

like image 27
artifex_knowledge Avatar answered Sep 28 '22 17:09

artifex_knowledge