Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: reading a binary file that is zipped

Tags:

r

I am trying to parse some online weather data with R. The data is a binary file that has been gzipped. An example file is:

ftp://ftp.cpc.ncep.noaa.gov/precip/CPC_UNI_PRCP/GAUGE_GLB/V1.0/2005/PRCP_CU_GAUGE_V1.0GLB_0.50deg.lnx.20050101.gz

If I download the file to my computer and manually unzip it, I can easily do the following:

  myFile <- ( "/tmp/PRCP_CU_GAUGE_V1.0GLB_0.50deg.lnx.20050101" )
  to.read = file( myFile, "rb")
  myPoints <- readBin(to.read, real(), n=1e6, size = 4, endian = "little")

What I would prefer to do is automate both the download/unzip along with the read. So I thought that would be as simple as the following:

p <- gzcon( url( "ftp://ftp.cpc.ncep.noaa.gov/precip/CPC_UNI_PRCP/GAUGE_GLB/V1.0/2005/PRCP_CU_GAUGE_V1.0GLB_0.50deg.lnx.20050101.gz" ) )
myPoints <- readBin(p, real(), n=1e6, size = 4, endian = "little")

This seems to work just dandy, but in the manual step the vector myPoints has length 518400, which is accurate. However if R handles the download and read as in the second example, I get a different length vector every time I run the code. Seriously. I'm not smoking anything. I swear. I run it multiple time and each time the vector is a different length, always less than the expected 518400.

I also tried getting R to download the gzip file using the following:

temp <- tempfile()
myFile <- download.file("ftp://ftp.cpc.ncep.noaa.gov/precip/CPC_UNI_PRCP/GAUGE_GLB/V1.0/2005/PRCP_CU_GAUGE_V1.0GLB_0.50deg.lnx.20050101.gz",temp)

I found that often that would return a Warning about the file not being the expected size. Like the following:

Warning message:
In download.file("ftp://ftp.cpc.ncep.noaa.gov/precip/CPC_UNI_PRCP/GAUGE_GLB/V1.0/2005/PRCP_CU_GAUGE_V1.0GLB_0.50deg.lnx.20050101.gz",  :
  downloaded length 162176 != reported length 179058

Any tips you can throw my way that might help me solve this?

-J

like image 204
JD Long Avatar asked May 27 '11 15:05

JD Long


1 Answers

Try this:

R> remfname <- "ftp://ftp.cpc.ncep.noaa.gov/precip/CPC_UNI_PRCP/GAUGE_GLB/V1.0/2005/PRCP_CU_GAUGE_V1.0GLB_0.50deg.lnx.20050101.gz"
R> locfname <- "/tmp/data.gz"
R> download.file(remfname, locfname)
trying URL 'ftp://ftp.cpc.ncep.noaa.gov/precip/CPC_UNI_PRCP/GAUGE_GLB/V1.0/2005/PRCP_CU_GAUGE_V1.0GLB_0.50deg.lnx.20050101.gz'
ftp data connection made, file length 179058 bytes
opened URL
==================================================
downloaded 174 Kb

R> con <- gzcon(file(locfname, "rb"))
R> myPoints <- readBin(con, real(), n=1e6, size = 4, endian = "little")
R> close(con)
R> str(myPoints)
 num [1:518400] 0 0 0 0 0 0 0 0 0 0 ...
R> 
like image 71
Dirk Eddelbuettel Avatar answered Oct 05 '22 00:10

Dirk Eddelbuettel