I have a struct that contains a vector of 2³¹ u32
values (total size about 8GB). I followed the bincode
example to write it to disk:
#[macro_use]
extern crate serde_derive;
extern crate bincode;
use std::fs::File;
use bincode::serialize_into;
#[derive(Serialize, Deserialize, PartialEq, Debug)]
pub struct MyStruct {
counter: Vec<u32>,
offset: usize,
}
impl MyStruct {
// omitted for conciseness
}
fn main() {
let m = MyStruct::new();
// fill entries in the counter vector
let mut f = File::create("/tmp/foo.bar").unwrap();
serialize_into(&mut f, &m).unwrap();
}
To avoid allocating the memory twice, I used serialize_into
to directly write into the file. However, the writing process is really slow (about half an hour). Is there a way to speed this up?
This is not an issue with serde and/ or bincode. Unlike some other languages, Rust does not use buffered I/O by default (See this question for details). Hence, the performance of this code can be significantly increased by using a buffered writer:
#[macro_use]
extern crate serde_derive;
extern crate bincode;
use std::fs::File;
use bincode::serialize_into;
use std::io::BufWriter;
#[derive(Serialize, Deserialize, PartialEq, Debug)]
pub struct MyStruct {
counter: Vec<u32>,
offset: usize,
}
impl MyStruct {
// omitted for conciseness
}
fn main() {
let m = MyStruct::new();
// fill entries in the counter vector
let mut f = BufWriter::new(File::create("/tmp/foo.bar").unwrap());
serialize_into(&mut f, &m).unwrap();
}
For me, this sped up the writing process from about half an hour to 40 seconds (50x speedup).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With