I'm trying to write a little buffer-thing for parsing so I can pull records off the front of as I parse them out, ideally without making any copies and just transferring ownership of chunks of the front of the buffer off as I run. Here's my implementation:
struct BufferThing {
buf: Vec<u8>,
}
impl BufferThing {
fn extract(&mut self, size: usize) -> Vec<u8> {
assert!(size <= self.buf.len());
let remaining: usize = self.buf.len() - size;
let ptr: *mut u8 = self.buf.as_mut_ptr();
unsafe {
self.buf = Vec::from_raw_parts(ptr.offset(size as isize), remaining, remaining);
Vec::from_raw_parts(ptr, size, size)
}
}
}
This compiles, but panics with a signal: 11, SIGSEGV: invalid memory reference
as it starts running. This is mostly the same code as the example in the Nomicon, but I'm trying to do it on Vec
's and I'm trying to split a field instead of the object itself.
Is it possible to do this without copying out one of the Vec
s? And is there some section of the Nomicon or other documentation that explains why I'm blowing everything up in the unsafe
block?
Unfortunately, that's not how memory allocators work. It might have been possible in the past, when memory was at a premium, but today's allocators are geared for speed rather than memory preservation.
A common implementation of memory allocators is to use slabs. Basically, it's:
struct Allocator {
less_than_32_bytes: List<[u8; 32]>,
less_than_64_bytes: List<[u8; 64]>,
less_than_128_bytes: List<[u8; 128]>,
less_than_256_bytes: List<[u8; 256]>,
less_than_512_bytes: List<[u8; 512]>,
...
}
When you request 96 bytes, it takes an element from less_than_128_bytes
.
When you free that element, it frees all of it, not just the first N bytes, and the whole block is now re-usable. Any pointer inside the block is now dangling and should NOT be dereferenced.
Furthermore, trying to free a pointer in the middle of a block will only confuse the allocator: it won't find it, because the contract is that you address blocks by their first byte.
You violated the contract using unsafe
code, BOOM.
The solution I propose is simple:
Vec<u8>
containing the whole buffer to parseVec
for parsingRust will check the lifetimes, so your slices cannot outlive the buffer, and slicing a slice further (s[..offset]
, s[offset..]
) does not allocate.
If you don't mind one allocation, there's Vec::split_off
which allocates a new Vec
big enough for the split part.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With