Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read Excel file from a URL using the readxl package

Tags:

Consider a file on the internet (like this one (note the s in https) https://evs.nci.nih.gov/ftp1/CDISC/SDTM/SDTM%20Terminology.xls

How can the sheet 2 of the file be read into R?

The following code is approximation of what is desired (but fails)

url1<-'https://evs.nci.nih.gov/ftp1/CDISC/SDTM/SDTM%20Terminology.xls' p1f <- tempfile() download.file(url1, p1f, mode="wb") p1<-read_excel(path = p1f, sheet = 2) 
like image 229
userJT Avatar asked Dec 28 '16 19:12

userJT


People also ask

How do I read a URL in Excel?

There is an option to DOWNLOAD the Excel (data) on that page, which will download the Excel data locally. If we right click Excel (data) and select Copy link address , we'll find the URL that will directly download the Excel data onto our machine.

What is Readxl package?

Overview. The readxl package makes it easy to get data out of Excel and into R. Compared to many of the existing packages (e.g. gdata, xlsx, xlsReadWrite) readxl has no external dependencies, so it's easy to install and use on all operating systems. It is designed to work with tabular data.

How do I read an Excel file in R markdown?

You can click in the upper left menu File > Import Dataset > From Excel and select the file to import it. Then you can copy the code that appears in the R console with the code required for import the data in xlsx and then copy it in a R Markdown code chunk.


2 Answers

This works for me on Windows:

library(readxl) library(httr) packageVersion("readxl") # [1] ‘0.1.1’  GET(url1, write_disk(tf <- tempfile(fileext = ".xls"))) df <- read_excel(tf, 2L) str(df) # Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 20131 obs. of  8 variables: # $ Code                        : chr  "C115388" "C115800" "C115801" "C115802" ... # $ Codelist Code               : chr  NA "C115388" "C115388" "C115388" ... # $ Codelist Extensible (Yes/No): chr  "No" NA NA NA ... # $ Codelist Name               : chr  "6 Minute Walk Functional Test Test Code" "6 Minute Walk Functional Test Test Code" "6 Minute Walk Functional Test Test Code" "6 Minute Walk Functional Test Test Code" ... # $ CDISC Submission Value      : chr  "SIXMW1TC" "SIXMW101" "SIXMW102" "SIXMW103" ... # $ CDISC Synonym(s)            : chr  "6 Minute Walk Functional Test Test Code" "SIXMW1-Distance at 1 Minute" "SIXMW1-Distance at 2 Minutes" "SIXMW1-Distance at 3 Minutes" ... # $ CDISC Definition            : chr  "6 Minute Walk Test test code." "6 Minute Walk Test - Distance at 1 minute." "6 Minute Walk Test - Distance at 2 minutes." "6 Minute Walk Test - Distance at 3 minutes." ... # $ NCI Preferred Term          : chr  "CDISC Functional Test 6MWT Test Code Terminology" "6MWT - Distance at 1 Minute" "6MWT - Distance at 2 Minutes" "6MWT - Distance at 3 Minutes" ... 
like image 50
lukeA Avatar answered Oct 14 '22 18:10

lukeA


From this issue on Github (#278):

some functionality for supporting more general inputs will be pulled out of readr, at which point readxl can exploit that.

So we should be able to pass urls directly to read_excel() in the (hopefully near) future.

like image 45
Aurèle Avatar answered Oct 14 '22 18:10

Aurèle