Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Serialization of large struct to disk with Serde and Bincode is slow [duplicate]

I have a struct that contains a vector of 2³¹ u32 values (total size about 8GB). I followed the bincode example to write it to disk:

#[macro_use]
extern crate serde_derive;
extern crate bincode;

use std::fs::File;
use bincode::serialize_into;

#[derive(Serialize, Deserialize, PartialEq, Debug)]
pub struct MyStruct {
    counter: Vec<u32>,
    offset: usize,
}

impl MyStruct {
    // omitted for conciseness
}


fn main() {
    let m = MyStruct::new();

    // fill entries in the counter vector

    let mut f = File::create("/tmp/foo.bar").unwrap();
    serialize_into(&mut f, &m).unwrap();
}

To avoid allocating the memory twice, I used serialize_into to directly write into the file. However, the writing process is really slow (about half an hour). Is there a way to speed this up?

like image 694
m00am Avatar asked Apr 23 '18 14:04

m00am


1 Answers

This is not an issue with serde and/ or bincode. Unlike some other languages, Rust does not use buffered I/O by default (See this question for details). Hence, the performance of this code can be significantly increased by using a buffered writer:

#[macro_use]
extern crate serde_derive;
extern crate bincode;

use std::fs::File;
use bincode::serialize_into;
use std::io::BufWriter;

#[derive(Serialize, Deserialize, PartialEq, Debug)]
pub struct MyStruct {
    counter: Vec<u32>,
    offset: usize,
}

impl MyStruct {
    // omitted for conciseness
}


fn main() {
    let m = MyStruct::new();

    // fill entries in the counter vector

    let mut f = BufWriter::new(File::create("/tmp/foo.bar").unwrap());
    serialize_into(&mut f, &m).unwrap();
}

For me, this sped up the writing process from about half an hour to 40 seconds (50x speedup).

like image 108
m00am Avatar answered Sep 20 '22 14:09

m00am