Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting Hyperlinks From Excel (.xlsx) with Python

I have been looking at mostly the xlrd and openpyxl libraries for Excel file manipulation. However, xlrd currently does not support formatting_info=True for .xlsx files, so I can not use the xlrd hyperlink_map function. So I turned to openpyxl, but have also had no luck extracting a hyperlink from an excel file with it. Test code below (the test file contains a simple hyperlink to google with hyperlink text set to "test"):

import openpyxl

wb = openpyxl.load_workbook('testFile.xlsx')

ws = wb.get_sheet_by_name('Sheet1')

r = 0
c = 0

print ws.cell(row = r, column = c). value
print ws.cell(row = r, column = c). hyperlink
print ws.cell(row = r, column = c). hyperlink_rel_id

Output:

test

None

I guess openpyxl does not currently support formatting completely either? Is there some other library I can use to extract hyperlink information from Excel (.xlsx) files?

like image 299
LucasS Avatar asked May 21 '13 18:05

LucasS


People also ask

How do I extract hyperlinks in Excel?

Right-click a hyperlink. From the Context menu, choose Edit Hyperlink. Excel displays the Edit Hyperlink dialog box. Select and copy (Ctrl+C) the entire URL from the Address field of the dialog box.

How do I extract Xlsx from Python?

The read_excel() function of pandas is used for reading the xlsx file. This function has used in the script to read the sales. xlsx file. The DataFrame() function has used here to read the content of the xlsx file in the data frame and store the values in the variable named data.


2 Answers

This is possible with openpyxl:

import openpyxl

wb = openpyxl.load_workbook('yourfile.xlsm')
ws = wb['Sheet1']
# This will fail if there is no hyperlink to target
print(ws.cell(row=2, column=1).hyperlink.target)
like image 61
wordsforthewise Avatar answered Oct 27 '22 09:10

wordsforthewise


Starting from at least version openpyxl-2.4.0b1 this bug https://bitbucket.org/openpyxl/openpyxl/issue/152/hyperlink-returns-empty-string-instead-of was fixed. Now it's return for cell Hyperlink object:

hl_obj = ws.row(col).hyperlink  # getting Hyperlink object for Cell
#hl_obj = ws.cell(row = r, column = c).hyperlink This could be used as well.
if hl_obj:
    print(hl_obj.display)
    print(hl_obj.target)
    print(hl_obj.tooltip) # you can see it when hovering mouse on hyperlink in Excel
    print(hl_obj) # to see other stuff if you need
like image 35
Hellohowdododo Avatar answered Oct 27 '22 10:10

Hellohowdododo