Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fread() of file from archive

Tags:

r

data.table

I would like to know what is the recommended way of reading a data.table from an archived file (zip archive in my case). One obvious option is to unzip it to a temporary file and then fread() it as usual. I don't want to bother about creating new file, so instead I use read.table() from unz() connection and then convert it with data.table():

mydt <- data.table(read.table(unz(myzipfilename, myfilename)))

This works fine but read.table() is slow for big files while fread() can't read unz() connection directly. I'm wondering if there is any better solution.

like image 799
Vasily A Avatar asked Oct 26 '15 08:10

Vasily A


People also ask

Is fread faster than read CSV?

Not only was fread() almost 2.5 times faster than readr's functionality in reading and binding the data, but perhaps even more importantly, the maximum used memory was only 15.25 GB, compared to readr's 27 GB. Interestingly, even though very slow, base R also spent less memory than the tidyverse suite.

How do I read a zip file in R?

To read a zip file and extract data from it to R environment, we can use the download. file() to download the zip, then unzip() allows to unzip the same and extract files using read. csv().

How do I unzip files for free?

Open File Explorer and find the zipped folder. To unzip the entire folder, right-click to select Extract All, and then follow the instructions. To unzip a single file or folder, double-click the zipped folder to open it. Then, drag or copy the item from the zipped folder to a new location.

What does fread do in R?

As mentioned above, fread() is a faster way to read files, particularly large files. The good thing about this function is that it automatically detects column types and separators, which can also be specified manually. Once the library is installed and loaded, we can use the fread() function to read the files.


1 Answers

Look at: Read Ziped CSV File with fread To avoid tmp files you can use unzip with -p extract files to pipe, no messages

You can use such a kind of statements with fread.

x = fread('unzip -p test/allRequests.csv.zip')

Or with gunzip

x = fread('gunzip -cq test/allRequests.csv.gz')

You can also use grep or other tools.

like image 148
Mirko Ebert Avatar answered Oct 15 '22 16:10

Mirko Ebert