What is the fastest way to write a vector to a file? I have a character vector that is ~2 million rows and that has rather large values (200 characters). I am currently doing
write(myVector, "myFile.txt")
But this is extremely slow. I have searched around for solutions but the fast writing functions (such as fwrite
) only take a data frame/matrix as input. Thanks!
In R, we can write data frames easily to a file, using the write. table() command. The first argument refers to the data frame to be written to the output file, the second is the name of the output file. By default R will surround each entry in the output file by quotes, so we use quote=F.
First of all, create a vector. Then, use write. csv function to save the vector in CSV file.
To create a vector of data frame values by rows we can use c function after transposing the data frame with t. For example, if we have a data frame df that contains many columns then the df values can be transformed into a vector by using c(t(df)), this will print the values of the data frame row by row.
After trying several options I found the fastest to be data.table::fwrite
. Like @Gregor says in his first comment, it is faster by an order of magnitude, which is worth the extra package loaded. It is also one of the ones that produces bigger files. (The other one is readr::write_lines
. Thanks to the comment by Calum You, I had forgotten this one.)
library(data.table)
library(readr)
set.seed(1) # make the results reproducible
n <- 1e6
x <- rnorm(n)
t1 <- system.time({
sink(file = "test_sink.txt")
cat(x, "\n")
sink()
})
t2 <- system.time({
cat(x, "\n", file = "test_cat.txt")
})
t3 <- system.time({
write(x, file = "test_write.txt")
})
t4 <- system.time({
fwrite(list(x), file = "test_fwrite.txt")
})
t5 <- system.time({
write_lines(x, "test_write_lines.txt")
})
rbind(sink = t1[1:3], cat = t2[1:3],
write = t3[1:3], fwrite = t4[1:3],
readr = t5[1:3])
# user.self sys.self elapsed
#sink 4.18 11.64 15.96
#cat 3.70 4.80 8.57
#write 3.71 4.87 8.64
#fwrite 0.42 0.02 0.51
#readr 2.37 0.03 6.66
In his second comment, Gregor notes that as.list
and list
behave differently. The difference is important. The former writes the vector as one row and many columns, the latter writes one column and many rows.
The speed difference is also noticeable:
fw1 <- system.time({
fwrite(as.list(x), file = "test_fwrite.txt")
})
fw2 <- system.time({
fwrite(list(x), file = "test_fwrite2.txt")
})
rbind(as.list = fw1[1:3], list = fw2[1:3])
# user.self sys.self elapsed
#as.list 0.67 0.00 0.75
#list 0.19 0.03 0.11
Final clean up.
unlink(c("test_sink.txt", "test_cat.txt", "test_write.txt",
"test_fwrite.txt", "test_fwrite2.txt", "test_write_lines.txt"))
You could use data.table
's fwrite:
library(data.table) # install if not installed already
fwrite(list(myVector), file = "myFile.csv")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With