I have a fairly large dataframe in R that I would like to export to SPSS.
This file has caused me hours of headaches trying to import it to R in the first place, however I got successful using read.fwf()
using the options comment.char="%"
(a character not appearing in the file) and fill= TRUE
(it was a fixed-width ASCII file with some rows lacking all variables, causing error messages).
Anyway, my data frame currently consists of 3,9 mill observations and 48 variables (all character). I can write it to file fairly quickly by splitting it into 4 x 1 mill obs sets with df2 <- df[1:1000000,]
followed by write.table(df2)
etc., but can't write the entire file in one sweep without the computer locking up and needing a hard reset to come back up.
After hearing anecdotal stories about how R is unsuited for large datasets for years, this is the first time I have actually encountered a problem of this kind. I wonder whether there are other approaches(low-level "dumping" the file directly to disk?) or whether there is some package unknown to me that can handle export of large files of this type efficiently?
To save data as an RData object, use the save function. To save data as a RDS object, use the saveRDS function. In each case, the first argument should be the name of the R object you wish to save. You should then include a file argument that has the file name or file path you want to save the data set to.
1) If your file is all character strings, then it saves using write.table()
much faster if you first change it to a matrix
.
2) also write it out in chunks of, say 1000000 rows, but always to the same file, and using the argument append = TRUE
.
After extensive work by Matt Dowle parallelizing and adding other efficiency improvements, fread
is now as much as 15x faster than write.csv
. See linked answer for more.
Now data.table
has an fwrite
function contributed by Otto Seiskari which seems to be about twice as fast as write.csv
in general. See here for some benchmarks.
library(data.table)
fwrite(DF, "output.csv")
Note that row names are excluded, since the data.table
type makes no use of them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With