Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rust way to design a storage struct and readonly struct users

Tags:

rust

tl;dr What is the best "Rust way" to create some byte storage, in this case a Vec<u8>, store that Vec<u8> in struct field that can be accessed with a key value (like a BTreeMap<usize, &Vec<u8>>), and later read those Vec<u8> from some other structs?
      Can this be extrapolated to a general good rust design for similar structs that act as storage and cache for blobs of bytes (Vec<u8>, [u8; 16384], etc.) accessible with a key (an usize offset, a u32 index, a String file path, etc.)?

Goal

I'm trying to create a byte storage struct and impl functions that:

  1. stores 16384 bytes read from disk on demand into "blocks" of Vec<u8> of capacity 16384
  2. other struct will analyze the various Vec<u8> and may need store their own references to those "blocks"
  3. be efficient: have only one copy of a "block" in memory, avoid unnecessary copying, clones, etc.

Unfortunately, for each implementation attempt, I run into difficult problems of borrowing, lifetime ellision, mutability, copying, or other problems.

Reduced Code example

I created a struct BlockReader that

  1. creates a Vec<u8> (Vec<u8>::with_capacity(16384)) typed as Block
  2. reads from a file (using File::seek and File::take::read_to_end) and stores 16384 of u8 into a Vec<u8>
  3. stores a reference to the Vec<u8> within a BTreeMap typed as Blocks

(playground code)

use std::io::Seek;
use std::io::SeekFrom;
use std::io::Read;
use std::fs::File;
use std::collections::BTreeMap;

type Block = Vec<u8>;
type Blocks<'a> = BTreeMap<usize, &'a Block>;

pub struct BlockReader<'a> {
    blocks: Blocks<'a>,
    file: File,
}

impl<'a> BlockReader<'a> {
    /// read a "block" of 16384 `u8` at file offset 
    /// `offset` which is multiple of 16384
    /// if the "block" at the `offset` is cached in
    /// `self.blocks` then return a reference to that
    /// XXX: assume `self.file` is already `open`ed file
    ///      handle
    fn readblock(& mut self, offset: usize) -> Result<&Block, std::io::Error> {
        // the data at this offset is the "cache"
        // return reference to that
        if self.blocks.contains_key(&offset) {
            return Ok(&self.blocks[&offset]);
        }
        // have not read data at this offset so read
        // the "block" of data from the file, store it,
        // return a reference
        let mut buffer = Block::with_capacity(16384);
        self.file.seek(SeekFrom::Start(offset as u64))?;
        self.file.read_to_end(&mut buffer);
        self.blocks.insert(offset, & buffer);
        Ok(&self.blocks[&offset])
    }
}

example use-case problem

There have been many problems with each implementation. For example, two calls to BlockReader.readblock by a struct BlockAnalyzer1 have caused endless difficulties:

pub struct BlockAnalyzer1<'b> {
   pub blockreader: BlockReader<'b>,
}

impl<'b> BlockAnalyzer1<'b> {
    /// contrived example function
    pub fn doStuff(&mut self) -> Result<bool, std::io::Error> {
        let mut b: &Block;
        match self.blockreader.readblock(3 * 16384) {
            Ok(val) => {
                b = val;
            },
            Err(err) => {
                return Err(err);
            }
        }
        match self.blockreader.readblock(5 * 16384) {
            Ok(val) => {
                b = val;
            },
            Err(err) => {
                return Err(err);
            }
        }
        Ok(true)
    }
}

results in

error[E0597]: `buffer` does not live long enough
  --> src/lib.rs:34:36
   |
15 | impl<'a> BlockReader<'a> {
   |      -- lifetime `'a` defined here
...
34 |         self.blocks.insert(offset, & buffer);
   |         ---------------------------^^^^^^^^-
   |         |                          |
   |         |                          borrowed value does not live long enough
   |         argument requires that `buffer` is borrowed for `'a`
35 |         Ok(&self.blocks[&offset])
36 |     }
   |     - `buffer` dropped here while still borrowed

However, I ran into many other errors for different permutations of this design, another error I ran into, for example

error[E0499]: cannot borrow `self.blockreader` as mutable more than once at a time
   --> src/main.rs:543:23
    |
463 | impl<'a> BlockUser1<'a> {
    |      ----------- lifetime `'a` defined here
...
505 |             match self.blockreader.readblock(3 * 16384) {
    |                   ---------------------------------------
    |                   |
    |                   first mutable borrow occurs here
    |                   argument requires that `self.blockreader` is borrowed for `'a`
...
543 |                 match self.blockreader.readblock(5 * 16384) {
    |                       ^^^^^^^^^^^^^^^^ second mutable borrow occurs here

In BlockReader, I've tried permutations of "Block" storage using Vec<u8>, &Vec<u8>, Box<Vec<u8>>, Box<&Vec<u8>>, &Box<&Vec<u8>>, &Pin<&Box<&Vec<u8>>, etc. However, each implementation permutation runs into various confounding problems with borrowing, lifetimes, and mutability.

Again, I'm not looking for the specific fix. I'm looking for a generally good rust-oriented design approach to this general problem: store a blob of bytes managed by some struct, have other struct get references (or pointers, etc.) to a blob of bytes, read that blob of bytes in loops (while possibly storing new blobs of bytes).

The Question For Rust Experts

How would a rust expert approach this problem?
How should I store the Vec<u8> (Block) in BlockReader.blocks, and also allow other Struct to store their own references (or pointers, or references to pointers, or pinned Box pointers, or etc.) to a Block?
Should the other structs copy or clone a Box<Block> or a Pin<Box<Block>> or something else?
Would using a different storage like a fixed sized array; type Block = [u8; 16384]; be easier to pass references for?
Should other Struct like BlockUser1 be given &Block, or Box<Block>, or &Pin<&Box<&Block>, or something else?

Again, each Vec<u8> (Block) is written once (during BlockReader.readblock) and may be read many times by other Structs by calling BlockReader.readblock and later by saving their own reference/pointer/etc. to that Block (ideally, maybe that's not ideal?).

like image 664
JamesThomasMoon Avatar asked Oct 14 '22 20:10

JamesThomasMoon


1 Answers

You can put the Vec<u8> behind an Rc<RefCell<...>> or simply a Rc<..> if they're immutable.

If you need thread-safe access you'll need to use an Arc<Mutex<...>> or Arc<RwLock<...>> instead.

Here's a converted version of your code. (There were a few typos and bits that needed changing to get it to compile - you should really fix those in your example, and give us something that nearly compiles...) You can also see this in the playground

use std::io::Seek;
use std::io::SeekFrom;
use std::io::Read;
use std::fs::File;
use std::cell::RefCell;
use std::rc::Rc;
use std::collections::BTreeMap;

type Block = Vec<u8>;
type Blocks = BTreeMap<usize, Rc<RefCell<Block>>>;

pub struct BlockReader {
    blocks: Blocks,
    file: File,
}

impl BlockReader {
    /// read a "block" of 16384 `u8` at file offset 
    /// `offset` which is multiple of 16384
    /// if the "block" at the `offset` is cached in
    /// `self.blocks` then return a reference to that
    /// XXX: assume `self.file` is already `open`ed file
    ///      handle
    fn readblock(& mut self, offset: usize) -> Result<Rc<RefCell<Block>>,std::io::Error> {
        // the data at this offset is the "cache"
        // return reference to that
        if self.blocks.contains_key(&offset) {
            return Ok(self.blocks[&offset].clone());
        }
        // have not read data at this offset so read
        // the "block" of data from the file, store it,
        // return a reference
        let mut buffer = Block::with_capacity(16384);
        self.file.seek(SeekFrom::Start(offset as u64))?;
        self.file.read_to_end(&mut buffer);
        self.blocks.insert(offset, Rc::new(RefCell::new(buffer)));
        Ok(self.blocks[&offset].clone())
    }
}

pub struct BlockAnalyzer1 {
   pub blockreader: BlockReader,
}

impl BlockAnalyzer1 {
    /// contrived example function
    pub fn doStuff(&mut self) -> Result<bool,std::io::Error> {
        let mut b: Rc<RefCell<Block>>;
        match self.blockreader.readblock(3 * 16384) {
            Ok(val) => {
                b = val;
            },
            Err(err) => {
                return Err(err);
            }
        }
        match self.blockreader.readblock(5 * 16384) {
            Ok(val) => {
                b = val;
            },
            Err(err) => {
                return Err(err);
            }
        }
        Ok(true)
    }
}
like image 126
Michael Anderson Avatar answered Oct 20 '22 02:10

Michael Anderson