Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract hyperlink from Excel file in R

Tags:

r

How do I take a cell in Excel, which has text that is hyperlinked, and extract the hyperlink part?

like image 268
rrs Avatar asked Jun 10 '14 19:06

rrs


People also ask

How do I extract hyperlinks in Excel?

Select the cell containing the hyperlink and press Ctrl + K to open the Edit Hyperlink menu. This will open the Edit Hyperlink menu and you can copy and paste the URL from the Address just like before.

How do I reference an Excel spreadsheet in R?

To read in the first tab of your excel sheet, simply enclose your file name inside the read_excel() function. From there, you can then choose which sheet to read with the sheet argument: either referencing the sheet's name or its index (number).

Can I import Excel into R?

Importing Excel files into R using readxl packageThe readxl package, developed by Hadley Wickham, can be used to easily import Excel files (xls|xlsx) into R without any external dependencies.


1 Answers

I found a super convoluted way to extract the hyperlinks:

library(XML)

# rename file to .zip
my.zip.file <- sub("xlsx", "zip", my.excel.file)
file.copy(from = my.excel.file, to = my.zip.file)

# unzip the file
unzip(my.zip.file)

# unzipping produces a bunch of files which we can read using the XML package
# assume sheet1 has our data
xml <- xmlParse("xl/worksheets/sheet1.xml")

# finally grab the hyperlinks
hyperlinks <- xpathApply(xml, "//x:hyperlink/@display", namespaces="x")

Derived from this blogpost.

like image 116
rrs Avatar answered Oct 20 '22 16:10

rrs