Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Quickly Write Vector to File r

Tags:

r

What is the fastest way to write a vector to a file? I have a character vector that is ~2 million rows and that has rather large values (200 characters). I am currently doing

write(myVector, "myFile.txt")

But this is extremely slow. I have searched around for solutions but the fast writing functions (such as fwrite) only take a data frame/matrix as input. Thanks!

like image 619
Walker in the City Avatar asked Mar 12 '18 22:03

Walker in the City


People also ask

How do I write data into a file in R?

In R, we can write data frames easily to a file, using the write. table() command. The first argument refers to the data frame to be written to the output file, the second is the name of the output file. By default R will surround each entry in the output file by quotes, so we use quote=F.

How do I save a vector in R?

First of all, create a vector. Then, use write. csv function to save the vector in CSV file.

How do I create a vector dataset in R?

To create a vector of data frame values by rows we can use c function after transposing the data frame with t. For example, if we have a data frame df that contains many columns then the df values can be transformed into a vector by using c(t(df)), this will print the values of the data frame row by row.


2 Answers

After trying several options I found the fastest to be data.table::fwrite. Like @Gregor says in his first comment, it is faster by an order of magnitude, which is worth the extra package loaded. It is also one of the ones that produces bigger files. (The other one is readr::write_lines. Thanks to the comment by Calum You, I had forgotten this one.)

library(data.table)
library(readr)

set.seed(1)    # make the results reproducible
n <- 1e6
x <- rnorm(n)

t1 <- system.time({
    sink(file = "test_sink.txt")
    cat(x, "\n")
    sink()
})
t2 <- system.time({
    cat(x, "\n", file = "test_cat.txt")
})
t3 <- system.time({
    write(x, file = "test_write.txt")
})
t4 <- system.time({
    fwrite(list(x), file = "test_fwrite.txt")
})
t5 <- system.time({
    write_lines(x, "test_write_lines.txt")
})

rbind(sink = t1[1:3], cat = t2[1:3], 
      write = t3[1:3], fwrite = t4[1:3],
      readr = t5[1:3])
#       user.self sys.self elapsed
#sink        4.18    11.64   15.96
#cat         3.70     4.80    8.57
#write       3.71     4.87    8.64
#fwrite      0.42     0.02    0.51
#readr       2.37     0.03    6.66

In his second comment, Gregor notes that as.list and list behave differently. The difference is important. The former writes the vector as one row and many columns, the latter writes one column and many rows.

The speed difference is also noticeable:

fw1 <- system.time({
    fwrite(as.list(x), file = "test_fwrite.txt")
})
fw2 <- system.time({
    fwrite(list(x), file = "test_fwrite2.txt")
})

rbind(as.list = fw1[1:3], list = fw2[1:3])
#        user.self sys.self elapsed
#as.list      0.67     0.00    0.75
#list         0.19     0.03    0.11

Final clean up.

unlink(c("test_sink.txt", "test_cat.txt", "test_write.txt",
         "test_fwrite.txt", "test_fwrite2.txt", "test_write_lines.txt"))
like image 59
Rui Barradas Avatar answered Sep 30 '22 18:09

Rui Barradas


You could use data.table's fwrite:

library(data.table) # install if not installed already
fwrite(list(myVector), file = "myFile.csv")
like image 32
JeanVuda Avatar answered Sep 30 '22 19:09

JeanVuda