I was inspired by the fst
package to try to write a C++ function to quickly serialize some data structures I have in R to disk.
But I am having trouble achieving the same write speed even on very simple objects. The code below is a simple example of writing a large 1 GB vector to disk.
Using custom C++ code, I achieve a write speed of 135 MB/s, which is the limit of my disk according to CrystalBench.
On the same data, write_fst
achieves a write speed of 223 MB/s, which seems impossible since my disk can't write that fast. (Note, I am using fst::threads_fst(1)
and compress=0
settings, and the files have the same data size.)
What am I missing?
How can I get the C++ function to write to disk faster?
C++ Code:
#include <Rcpp.h>
#include <fstream>
#include <cstring>
#include <iostream>
// [[Rcpp::plugins(cpp11)]]
using namespace Rcpp;
// [[Rcpp::export]]
void test(SEXP x) {
char* d = reinterpret_cast<char*>(REAL(x));
long dl = Rf_xlength(x) * 8;
std::ofstream OutFile;
OutFile.open("/tmp/test.raw", std::ios::out | std::ios::binary);
OutFile.write(d, dl);
OutFile.close();
}
R Code:
library(microbenchmark)
library(Rcpp)
library(dplyr)
library(fst)
fst::threads_fst(1)
sourceCpp("test.cpp")
x <- runif(134217728) # 1 gigabyte
df <- data.frame(x)
microbenchmark(test(x), write_fst(df, "/tmp/test.fst", compress=0), times=3)
Unit: seconds
expr min lq mean median uq max neval
test(x) 6.549581 7.262408 7.559021 7.975235 8.063740 8.152246 3
write_fst(df, "/tmp/test.fst", compress = 0) 4.548579 4.570346 4.592398 4.592114 4.614307 4.636501 3
file.info("/tmp/test.fst")$size/1e6
# [1] 1073.742
file.info("/tmp/test.raw")$size/1e6
# [1] 1073.742
Benchmarking SSD write and read performance is a tricky business and hard to do right. There are many effects to take into account.
For example, many SSD's use techniques to accelerate data speeds (intelligently), such as DRAM caching. Those techniques can increase your write speed, especially in cases where an identical dataset is written to disk multiple times, as in your example. To avoid this effect, each iteration of the benchmark should write a unique dataset to disk.
The block sizes of write and read operations are also important: the default physical sector size of SSD's is 4KB. Writing smaller blocks hampers performance, but with fst
I found that writing blocks of data larger than a few MB's also lowers performance, due to CPU cache effects. Because fst
writes it's data to disk in relatively small chunks, it's usually faster than alternatives that write data in a single large block.
To facilitate this block-wise writing to SSD, you could modify your code:
Rcpp::cppFunction('
#include <fstream>
#include <cstring>
#include <iostream>
#define BLOCKSIZE 262144 // 2^18 bytes per block
long test_blocks(SEXP x, Rcpp::String path) {
char* d = reinterpret_cast<char*>(REAL(x));
std::ofstream outfile;
outfile.open(path.get_cstring(), std::ios::out | std::ios::binary);
long dl = Rf_xlength(x) * 8;
long nr_of_blocks = dl / BLOCKSIZE;
for (long block_nr = 0; block_nr < nr_of_blocks; block_nr++) {
outfile.write(&d[block_nr * BLOCKSIZE], BLOCKSIZE);
}
long remaining_bytes = dl % BLOCKSIZE;
outfile.write(&d[nr_of_blocks * BLOCKSIZE], remaining_bytes);
outfile.close();
return dl;
}
')
Now we can compare methods test
, test_blocks
and fst::write_fst
in a single benchmark:
x <- runif(134217728) # 1 gigabyte
df <- data.frame(X = x)
fst::threads_fst(1) # use fst in single threaded mode
microbenchmark::microbenchmark(
test(x, "test.bin"),
test_blocks(x, "test.bin"),
fst::write_fst(df, "test.fst", compress = 0),
times = 10)
#> Unit: seconds
#> expr min lq mean
#> test(x, "test.bin") 1.473615 1.506019 1.590430
#> test_blocks(x, "test.bin") 1.018082 1.062673 1.134956
#> fst::write_fst(df, "test.fst", compress = 0) 1.127446 1.144039 1.249864
#> median uq max neval
#> 1.600055 1.635883 1.765512 10
#> 1.131631 1.204373 1.264220 10
#> 1.261269 1.327304 1.343248 10
As you can see, the modified method test_blocks
is about 40 percent faster than the original method and even slightly faster than the fst
package. This is expected, because fst
has some overhead in storing column and table information, (possible) attributes, hashes and compression information.
Please note that the difference between fst
and your initial test
method is much less pronounced on my system, showing again the challenges in using benchmarks to optimize a system.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With