Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rust - Why such a big difference in memory usage between malloc/alloc and more 'idiomatic' approaches

I've been using and modifying this library https://github.com/sile/patricia_tree

One thing that bothered a bit was how much unsafe was used in node.rs, particularly, it's defined as just a pointer to some heap location. When doing the first benchmark listed on the readme page (wikipedia inputs), the PatriciaSet uses ~700mb (PatriciaSet is just holding a Node at it's root)

pub struct Node<V> {
    // layout:
    // all these fields accessed with ptr.offset
    //   - flags: u8
    //   - label_len: u8
    //   - label: [u8; label_len]
    //   - value: Option<V>
    //   - child: Option<Node<V>>
    //   - sibling: Option<Node<V>>
    ptr: *mut u8,
    _value: PhantomData<V>,
}

and uses malloc for allocation:

        let ptr = unsafe { libc::malloc(block_size) } as *mut u8;

I was told this memory is not aligned properly, so I tried to add the new alloc api and use Layout/alloc, this is also still not aligning properly just seems to 'work'. full pr

       let layout = Layout::array::<u8>(block_size).expect("Failed to get layout");
       let ptr = unsafe { alloc::alloc(layout) as *mut u8 };

This single change, which also holds the layout in the block of memory pointed to by ptr, caused memory consumption to go up 40% under the performance tests for very large trees. The layout type is just 2 words wide, so this was unexpected. For the same tests this uses closer to ~1000mb (compared to previous 700)

In another attempt, I tried to remove most of the unsafe and go with something a bit more rust-y full pr here

pub struct Node<V> {
    value: Option<V>,
    child: Option<*mut Node<V>>,
    sibling: Option<*mut Node<V>>,
    label: SmallVec<[u8; 10]>,
    _value: PhantomData<V>,
}

creating the node in a manner you may expect in rust

        let child = child.map(|c| Box::into_raw(Box::new(c)));
        let sibling = sibling.map(|c| Box::into_raw(Box::new(c)));
        Node {
            value,
            child,
            sibling,
            label: SmallVec::from_slice(label),
            _value: PhantomData,
        }

Performance wise, it's about equivalent to the original unmodified library, but it's memory consumption appears to be not much better than just inserting every single item in a HashSet, using around ~1700mb for the first benchmark.

The final data structure which uses node is a compressed trie or a 'patricia tree'. No other code was changed other than the structure of the Node and the implementations of some of its methods that idiomatically fall out of those changes.

I was hoping someone could tip me off of a what exactly is causing such a drastic difference in memory consumption between these implementations. In my mind, they should be about equivalent. They all allocate around the same number of fields with about the same widths. The unsafe first one is able to store a dynamic label length in-line, so that could be one reason. But smallvec should be able to do something similar with smaller label sizes (using just Vec was even worse).

Looking for any suggestions or help on why the ending results are so different. If curious, the code to run these is here though it is spread across the original authors and my own repo

Tools for how to investigate the differences between these would be open to hearing that as well!

like image 296
leshow Avatar asked Jul 04 '20 03:07

leshow


1 Answers

You're seeing increased memory use for a couple of reasons. I'll assume a standard 64-bit Unix system.

First, a pointer is 8 bytes. An Option<*mut Node<V>> is 16 bytes because pointers aren't subject to the nullable optimization that happens with references. References can never be null, so the compiler can convert an Option<&'a V> into a null pointer if the value is None and a regular pointer if it's Some, but pointers can be null so that can't happen here. Rust makes the size of the enum field the same size as the size of the data type, so you use 16 bytes per pointer here.

The easiest and most type-safe way to deal with this is just to use Option<NonNull<Node<V>>>. Doing that drops your structure by 16 bytes total.

Second, your SmallVec is 32 bytes in size. They avoid needing a heap allocation in some cases, but they aren't, despite the name, necessarily small. You can use a regular Vec or a boxed slice, which will likely result in lower memory usage at the cost of an additional allocation.

With those changes and using a Vec, your structure will be 48 bytes in size. With a boxed slice, it will be 40. The original used 72. How much savings you see will depend on how big your labels are, since you'll need to allocate space for them.

The required alignment for this structure is 8 bytes because the largest alignment of any type (the pointer) is 8 bytes. Even on architectures like x86-64 where alignment is not required for all types, it is still faster, and sometimes significantly so, so the compiler always does it.

The original code was not properly aligned at all and will either outright fail (on SPARC), perform badly (on PowerPC), or require an alignment trap into the kernel if they're enabled (on MIPS) or fail if they're not. An alignment trap into the kernel for unaligned access performs terribly because you have to do a full context switch just to load and shift two words, so most people turn them off.

The reason that this is not properly aligned is because Node contains a pointer and it appears in the structure at an offset which is not guaranteed to be a multiple of 8. If it were rewritten such that the child and sibling attributes came first, then it would be properly aligned provided the memory were suitably aligned (which malloc guarantees but your Rust allocation does not). You could create a suitable Layout with Layout::from_size_align(block_size, std::mem::align_of::<*mut Node>()).

So while the original code worked on x86-64 and saved a bunch of memory, it performed badly and was not portable.

The code I used for this example is simply the following, plus some knowledge about how Rust does nullable types and knowledge about C and memory allocation:

extern crate smallvec;
use smallvec::SmallVec;
use std::marker::PhantomData;
use std::ptr::NonNull;

pub struct Node<V> {
    value: Option<V>,
    child: Option<NonNull<Node<V>>>,
    sibling: Option<NonNull<Node<V>>>,
    label: Vec<u8>,
    _value: PhantomData<V>,
}

fn main() {
    println!("size: {}", std::mem::size_of::<Node<()>>());
}
like image 152
bk2204 Avatar answered Oct 10 '22 14:10

bk2204