Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to download an .xlsx file in R and load the data into a dataframe?

I'm trying to download an .xlsx file from the eia and getting the following error.

The error is: "Error: ZipException (Java): invalid entry size (expected 2385 but got 2390 bytes)"

I have tried the following code:

library(XLConnect)
tmp = tempfile(fileext = ".xlsx")
download.file(url = "http://www.eia.gov/petroleum/drilling/xls/dpr-data.xlsx", destfile = tmp)
readWorksheetFromFile(file = tmp, sheet = "Eagle Ford Region", header = FALSE, startRow = 9, endRow = 151)

I have tried the other recommendations at: Read Excel file into R with XLConnect package from URL

like image 899
user2946746 Avatar asked Mar 04 '15 17:03

user2946746


People also ask

How do I export data from Excel to a DataFrame in R?

First, we import xlsx package by using the library() function then give the full path of the Excel file to excel_path named variable. To create a dataframe keep extracting columns from the file and combine them into one data frame once done. Program: R.

Which command can be used to import data from XLSX file in R?

Using readxl package xlsx” is in your current working directory. To know your current working directory, type the function getwd() in R console. If you use the R code above in RStudio, you will be asked to choose a file.

How do I read an Excel spreadsheet in R?

Method 1: Using read_excel() from readxl read_excel() function is basically used to import/read an excel file and it can only be accessed after importing of the readxl library in R language.. Example: R.


2 Answers

You should use wb - binary mode while downloading the files, that are not plain text:

download.file(url = "http://www.eia.gov/petroleum/drilling/xls/dpr-data.xlsx", destfile = tmp, mode="wb")

This will solve the issue.

like image 113
m0nhawk Avatar answered Oct 08 '22 18:10

m0nhawk


I'm really late to the party, but I spent a lot of time stuck on this same error, and this didn't work for me. If you're only trying to download the file for the purpose of loading it from disk using read_xlsx, a better solution which is to skip the disk step entirely:

# install.packages(rio)
library(rio)

data = rio::import(url)

Cheers

like image 22
user2844936 Avatar answered Oct 08 '22 20:10

user2844936