Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use fread to read gz files in R?

I am on a windows machine trying to speed up the read.table step. My files are all .gz.

x=paste("gzip -c ",filename,sep="")
phi_raw = fread(x)

Error in fread(x) : 

Cannot understand the error . Its a bit too cryptic for me.

Not a duplicate as suggested by zx8754: using specifically in the context of fread. And while fread dows not have native support for gzip, this paradigm should work. See http://www.molpopgen.org/coding/datatable.html

Update

Per suggestion below using system yields a longer error message - though i am still stuck.

Error in fread(system(x)) : 

  'input' must be a single character string containing a file name, a command, full path to a file, a URL starting 'http[s]://', 'ftp[s]://' or 'file://', or the input data itself

In addition: Warning message:


running command 'gzip -c D:/x_.gz' had status 1

Update

Running with gunzip as pointed out below:

Error in fread(system(x)) : 

  'input' must be a single character string containing a file name, a command, full path to a file, a URL starting 'http[s]://', 'ftp[s]://' or 'file://', or the input data itself

In addition: Warning message:

running command 'gunzip -c D:/XX_.gz' had status 127

note the different status

like image 219
pythOnometrist Avatar asked Jun 09 '16 13:06

pythOnometrist


2 Answers

data.table now supports reading .gz files directly with the fread function, provided that the R.utils package is installed.

As suggested in this answer to a similar question, you can simply run the following:

library(data.table)
phi_raw <- fread("filename.gz")
like image 119
Lino Ferreira Avatar answered Nov 01 '22 11:11

Lino Ferreira


I often use gzip with fread on Windows. It reads in the files without decompressing them. I would try adding the -d option with the gzip command. Specifically, in your code, try x=paste("gzip -dc ",filename,sep=""). Here is a reproducible example that works on my machine:

df <- data.frame(x = 1:10, y = letters[1:10])
write.table(df, 'df.txt', row.names = F, quote = F, sep = '\t')
system("which gzip")
system("gzip df.txt")
data.table::fread("gzip -dc df.txt")

And here is my sessionInfo().

R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] rsconnect_0.4.3  tools_3.3.1      data.table_1.9.6 chron_2.3-47 

I have successfully used gzip on Windows without adding a decompressed file to my hard drive using both Rtools (https://cran.r-project.org/bin/windows/Rtools/) and Gow (https://github.com/bmatzelle/gow/wiki). If my reproducible example above does not work for you, use the which gzip and which gunzip commands to see the exact .exe that is running. If it is not Rtools or Gow, perhaps try installing one of those two and trying the reproducible example again.

like image 21
jmuhlenkamp Avatar answered Nov 01 '22 11:11

jmuhlenkamp