Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Decompress gz file using R

Tags:

r

gzip

I have used ?unzip in the past to get at contents of a zipped file using R. This time around, I am having a hard time extracting the files from a .gz file which can be found here.

I have tried ?gzfile and ?gzcon but have not been able to get it to work. Any help you can provide will be greatly appreciated.

like image 257
Btibert3 Avatar asked Apr 23 '11 13:04

Btibert3


People also ask

How do I untar a file in R?

Extracting files from a tar archive is done with untar function from the utils package (which is included in base R). This will extract all files in "bar. tar" to the "foo" directory, which will be created if necessary. Tilde expansion is done automatically from your working directory.

How do I open a .GZ file?

Select all the files and folders inside the compressed file, or multi-select only the files or folders you want to open by holding the CTRL key and left-clicking on them. Click 1-click Unzip, and choose Unzip to PC or Cloud in the WinZip toolbar under the Unzip/Share tab.

How do I compress a file in R?

To zip files in R, use the zip() function. The zipped file is stored inside the current directory unless a different path is specified in the zip() function argument. The zip() method creates a new ZIP archive, and it overwrites the output file if it exists.

How do I unzip a .GZ file in Linux?

You can unzip GZ files in Linux by adding the -d flag to the Gzip/Gunzip command. All the same flags we used above can be applied. The GZ file will be removed by default after we uncompressed it unless we use the -k flag. Below we will unzip the GZ files we compressed in the same directory.


1 Answers

Here is a worked example that may help illustrate what gzfile() and gzcon() are for

foo <- data.frame(a=LETTERS[1:3], b=rnorm(3)) foo #  a        b #1 A 0.586882 #2 B 0.218608 #3 C 1.290776 write.table(foo, file="/tmp/foo.csv") system("gzip /tmp/foo.csv")             # being very explicit 

Now that the file is written, instead of implicit use of file(), use gzfile():

read.table(gzfile("/tmp/foo.csv.gz"))    #  a        b #1 A 0.586882 #2 B 0.218608 #3 C 1.290776 

The file you point is a compressed tar archive, and as far as I know, R itself has no interface to tar archives. These are commonly used to distribute source code--as for example for R packages and R sources.

like image 97
Dirk Eddelbuettel Avatar answered Sep 21 '22 12:09

Dirk Eddelbuettel