Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a write function that corresponds to fread() in the data.table package? [duplicate]

I have a data.table that is not very big (2 GB) but for some reason write.csv takes an extremely long time to write it out (I've never actually finished waiting) and seems to use a ton of RAM to do it.

I tried converting the data.table to a data.frame although this shouldn't really do anything since data.table extends data.frame. has anyone run into this?

More importantly, if you stop it with Ctrl-C, R does not seem to give memory back.

like image 439
Alex Avatar asked Aug 17 '12 22:08

Alex


People also ask

What package is fread in in R?

table package comes with a function called fread which is a very efficient and speedy function for reading data from files. It is similar to read.

What does the data table () function provide to big data processing?

The data. table package provides a faster implementation of the merge() function. The syntax is pretty much the same as base R's merge() .

What does fread mean in R?

Its fread() function is meant to import data from regular delimited files directly into R, without any detours or nonsense. Note that “regular” in this case means that every row of your data needs to have the same number of columns.

What package is data table in R?

Data. table is an extension of data. frame package in R. It is widely used for fast aggregation of large datasets, low latency add/update/remove of columns, quicker ordered joins, and a fast file reader.


1 Answers

UPDATE 2019.01.07:

fwrite has been on CRAN since 2016-11-25.

install.packages("data.table")

UPDATE 08.04.2016:

fwrite has been recently added to the data.table package's development version. It also runs in parallel (implicitly).

# Install development version of data.table
install.packages("data.table", 
                  repos = "https://Rdatatable.github.io/data.table", type = "source")

# Load package
library(data.table)

# Load data        
data(USArrests)

# Write CSV
fwrite(USArrests, "USArrests_fwrite.csv")

According to the detailed benchmark tests shown under speeding up the performance of write.table, fwrite is ~17x faster than write.csv there (YMMV).


UPDATE 15.12.2015:

In the future there might be a fwrite function in the data.table package, see: https://github.com/Rdatatable/data.table/issues/580. In this thread a GIST is linked, which provides a prototype for such a function speeding up the process by a factor of 2 (according to the author, https://gist.github.com/oseiskar/15c4a3fd9b6ec5856c89).

ORIGINAL ANSWER:

I had the same problems (trying to write even larger CSV files) and decided finally against using CSV files.

I would recommend you to use SQLite as it is much faster than dealing with CSV files:

require("RSQLite")
# Set up database    
drv <- dbDriver("SQLite")
con <- dbConnect(drv, dbname = "test.db")
# Load example data
data(USArrests)
# Write data "USArrests" in table "USArrests" in database "test.db"    
dbWriteTable(con, "arrests", USArrests)

# Test if the data was correctly stored in the database, i.e. 
# run an exemplary query on the newly created database 
dbGetQuery(con, "SELECT * FROM arrests WHERE Murder > 10")       
# row_names Murder Assault UrbanPop Rape
# 1         Alabama   13.2     236       58 21.2
# 2         Florida   15.4     335       80 31.9
# 3         Georgia   17.4     211       60 25.8
# 4        Illinois   10.4     249       83 24.0
# 5       Louisiana   15.4     249       66 22.2
# 6        Maryland   11.3     300       67 27.8
# 7        Michigan   12.1     255       74 35.1
# 8     Mississippi   16.1     259       44 17.1
# 9          Nevada   12.2     252       81 46.0
# 10     New Mexico   11.4     285       70 32.1
# 11       New York   11.1     254       86 26.1
# 12 North Carolina   13.0     337       45 16.1
# 13 South Carolina   14.4     279       48 22.5
# 14      Tennessee   13.2     188       59 26.9
# 15          Texas   12.7     201       80 25.5

# Close the connection to the database
dbDisconnect(con)

For further information, see http://cran.r-project.org/web/packages/RSQLite/RSQLite.pdf

You can also use a software like http://sqliteadmin.orbmu2k.de/ to access the database and export the database to CSV etc.

--

like image 77
majom Avatar answered Sep 17 '22 17:09

majom