Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read and write csv.gz file in R

Tags:

r

gzip

There are very similar questions about this topic, but non deals with this under R quite precisely.

I have a csv.gz file and I would like to "unzip" the file and have it as ordinary *.csv file. I suppose one would go about first reading the csv.gz file and latter via write.csv command create the csv file itself.

Here what I have tried a part of other things:

gz.file <- read.csv(gzfile(file.choose()), as.is = TRUE)

gives:

  head(gz.file)
        farmNo.milk.energy.vet.cows
  1     1;862533;117894;21186;121
  2     2;605764;72049;43910;80
  3     3;865658;158466;54583;95
  4     4;662331;66783;45469;87
  5     5;1003444;101714;81625;125
  6     6;923512;252408;96807;135

File claims to be data.frame but doesn't behave like one, what I'm missing here?

 class(gz.file)
 [1] "data.frame"

Once read into memory I would like to have it in pure csv file, so would write.csv would be the solution?

write.csv(gz.file, file="PATH")
like image 332
Maximilian Avatar asked Dec 16 '13 11:12

Maximilian


People also ask

How do I read a .GZ file in R?

read the content of /home/file. gz into R. For (1) you can use `gunzip` at the command line, or gunzip("/home/file. gz") of the R.

Can you gzip a GZ file?

Using Gzip/Gunzip to Create or Unzip GZ FIles Gzip and Gunzip commands can be used to unzip GZ files in Linux, except for compressed Tar archives. Although a TAR. GZ file is a TAR archive compressed by Gzip, only the Tar command will allow you to uncompress and extract files from it.

What is .GZ file type?

GZIP is the file format, and GZ is the file extension used for GZIP compressed files.


2 Answers

In recent versions of data.table fast csv reader fread got support for csv.gz files. It automatically detects if it needs to decompress based on the filename so there is not much new to learn. Following should work.

library(data.table)
dt = fread("data.csv.gz")

This feature requires extra, fortunately lightweight, dependency as you can read in ?fread manual

Compressed files ending .gz and .bz2 are supported if the R.utils package is installed.

To write compressed argument use fwrite(compress="gzip").

like image 101
jangorecki Avatar answered Sep 21 '22 13:09

jangorecki


tidyverse, particularly the readr package, has transparent support of gzip compressed files (and a few others)

library(readr)

read_csv("file.csv.gz") -> d

# write uncompressed data
d %>% write_csv("file.csv")
like image 21
liborm Avatar answered Sep 19 '22 13:09

liborm