Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting a `Vec`

I'm trying to write a little buffer-thing for parsing so I can pull records off the front of as I parse them out, ideally without making any copies and just transferring ownership of chunks of the front of the buffer off as I run. Here's my implementation:

struct BufferThing {
    buf: Vec<u8>,
}

impl BufferThing {
    fn extract(&mut self, size: usize) -> Vec<u8> {
        assert!(size <= self.buf.len());
        let remaining: usize = self.buf.len() - size;
        let ptr: *mut u8 = self.buf.as_mut_ptr();

        unsafe {
            self.buf = Vec::from_raw_parts(ptr.offset(size as isize), remaining, remaining);
            Vec::from_raw_parts(ptr, size, size)
        }
    }
}

This compiles, but panics with a signal: 11, SIGSEGV: invalid memory reference as it starts running. This is mostly the same code as the example in the Nomicon, but I'm trying to do it on Vec's and I'm trying to split a field instead of the object itself.

Is it possible to do this without copying out one of the Vecs? And is there some section of the Nomicon or other documentation that explains why I'm blowing everything up in the unsafe block?

like image 745
Roderick Avatar asked Feb 10 '17 16:02

Roderick


1 Answers

Unfortunately, that's not how memory allocators work. It might have been possible in the past, when memory was at a premium, but today's allocators are geared for speed rather than memory preservation.

A common implementation of memory allocators is to use slabs. Basically, it's:

struct Allocator {
    less_than_32_bytes: List<[u8; 32]>,
    less_than_64_bytes: List<[u8; 64]>,
    less_than_128_bytes: List<[u8; 128]>,
    less_than_256_bytes: List<[u8; 256]>,
    less_than_512_bytes: List<[u8; 512]>,
    ...
}

When you request 96 bytes, it takes an element from less_than_128_bytes.

When you free that element, it frees all of it, not just the first N bytes, and the whole block is now re-usable. Any pointer inside the block is now dangling and should NOT be dereferenced.

Furthermore, trying to free a pointer in the middle of a block will only confuse the allocator: it won't find it, because the contract is that you address blocks by their first byte.

You violated the contract using unsafe code, BOOM.


The solution I propose is simple:

  • use a single Vec<u8> containing the whole buffer to parse
  • use slices into this Vec for parsing

Rust will check the lifetimes, so your slices cannot outlive the buffer, and slicing a slice further (s[..offset], s[offset..]) does not allocate.


If you don't mind one allocation, there's Vec::split_off which allocates a new Vec big enough for the split part.

like image 101
Matthieu M. Avatar answered Oct 24 '22 01:10

Matthieu M.