Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Structure containing fields that know each other

Tags:

rust

I have a set of objects that need to know each other to cooperate. These objects are stored in a container. I'm trying to get a very simplistic idea of how to architecture my code in Rust.

Let's use an analogy. A Computer contains:

  • 1 Mmu
  • 1 Ram
  • 1 Processor

In Rust:

struct Computer {
    mmu: Mmu,
    ram: Ram,
    cpu: Cpu,
}

For anything to work, the Cpu needs to know about the Mmu it is linked to, and the Mmu needs to know the Ram it is linked to.

I do not want the Cpu to aggregate by value the Mmu. Their lifetimes differ: the Mmu can live its own life by itself. It just happens that I can plug it to the Cpu. However, there is no sense in creating a Cpu without an Mmu attached to it, since it would not be able to do its job. The same relation exists between Mmu and Ram.

Therefore:

  • A Ram can live by itself.
  • An Mmu needs a Ram.
  • A Cpu needs an Mmu.

How can I model that kind of design in Rust, one with a struct whose fields know about each other.

In C++, it would be along the lines of:

>

struct Ram
{
};

struct Mmu
{
  Ram& ram;
  Mmu(Ram& r) : ram(r) {}
};

struct Cpu
{
  Mmu& mmu;
  Cpu(Mmu& m) : mmu(m) {}
};

struct Computer
{
    Ram ram;
    Mmu mmu;
    Cpu cpu;
    Computer() : ram(), mmu(ram), cpu(mmu) {}
};

Here is how I started translating that in Rust:

struct Ram;

struct Mmu<'a> {
    ram: &'a Ram,
}

struct Cpu<'a> {
    mmu: &'a Mmu<'a>,
}

impl Ram {
    fn new() -> Ram {
        Ram
    }
}

impl<'a> Mmu<'a> {
    fn new(ram: &'a Ram) -> Mmu<'a> {
        Mmu {
            ram: ram
        }
    }
}

impl<'a> Cpu<'a> {
    fn new(mmu: &'a Mmu) -> Cpu<'a> {
        Cpu {
            mmu: mmu,
        }
    }
}

fn main() {
    let ram = Ram::new();
    let mmu = Mmu::new(&ram);
    let cpu = Cpu::new(&mmu);
}

That is fine and all, but now I just can't find a way to create the Computer struct.

I started with:

struct Computer<'a> {
    ram: Ram,
    mmu: Mmu<'a>,
    cpu: Cpu<'a>,
}

impl<'a> Computer<'a> {
    fn new() -> Computer<'a> {
        // Cannot do that, since struct fields are not accessible from the initializer
        Computer {
            ram: Ram::new(),
            mmu: Mmu::new(&ram),
            cpu: Cpu::new(&mmu),
        }

        // Of course cannot do that, since local variables won't live long enough
        let ram = Ram::new();
        let mmu = Mmu::new(&ram);
        let cpu = Cpu::new(&mmu);
        Computer {
            ram: ram,
            mmu: mmu,
            cpu: cpu,
        }
    }
}

Okay, whatever, I won't be able to find a way to reference structure fields between them. I thought I could come up with something by creating the Ram, Mmu and Cpu on the heap; and put that inside the struct:

struct Computer<'a> {
    ram: Box<Ram>,
    mmu: Box<Mmu<'a>>,
    cpu: Box<Cpu<'a>>,
}

impl<'a> Computer<'a> {
    fn new() -> Computer<'a> {
        let ram = Box::new(Ram::new());
        // V-- ERROR: reference must be valid for the lifetime 'a
        let mmu = Box::new(Mmu::new(&*ram));
        let cpu = Box::new(Cpu::new(&*mmu));
        Computer {
            ram: ram,
            mmu: mmu,
            cpu: cpu,
        }
    }
}

Yeah that's right, at this point in time Rust has no way to know that I'm going to transfer ownership of let ram = Box::new(Ram::new()) to the Computer, so it will get a lifetime of 'a.

I've been trying various more or less hackish ways to get that right, but I just can't come up with a clean solution. The closest I've come is to drop the reference and use an Option, but then all my methods have to check whether the Option is Some or None, which is rather ugly.

I think I'm just on the wrong track here, trying to map what I would do in C++ in Rust, but that doesn't work. That's why I would need help finding out what is the idiomatic Rust way of creating this architecture.

like image 242
NewbiZ Avatar asked Jan 23 '15 15:01

NewbiZ


1 Answers

In this answer I will discuss two approaches to solving this problem, one in safe Rust with zero dynamic allocation and very little runtime cost, but which can be constricting, and one with dynamic allocation that uses unsafe invariants.

The Safe Way (Cell<Option<&'a T>)

use std::cell::Cell;

#[derive(Debug)]
struct Computer<'a> {
    ram: Ram,
    mmu: Mmu<'a>,
    cpu: Cpu<'a>,
}

#[derive(Debug)]
struct Ram;

#[derive(Debug)]
struct Cpu<'a> {
    mmu: Cell<Option<&'a Mmu<'a>>>,
}

#[derive(Debug)]
struct Mmu<'a> {
    ram: Cell<Option<&'a Ram>>,
}

impl<'a> Computer<'a> {
    fn new() -> Computer<'a> {
        Computer {
            ram: Ram,
            cpu: Cpu {
                mmu: Cell::new(None),
            },
            mmu: Mmu {
                ram: Cell::new(None),
            },
        }
    }

    fn freeze(&'a self) {
        self.mmu.ram.set(Some(&self.ram));
        self.cpu.mmu.set(Some(&self.mmu));
    }
}

fn main() {
    let computer = Computer::new();
    computer.freeze();

    println!("{:?}, {:?}, {:?}", computer.ram, computer.mmu, computer.cpu);
}

Playground

Contrary to popular belief, self-references are in fact possible in safe Rust, and even better, when you use them Rust will continue to enforce memory safety for you.

The main "hack" needed to get self, recursive, or cyclical references using &'a T is the use of a Cell<Option<&'a T> to contain the reference. You won't be able to do this without the Cell<Option<T>> wrapper.

The clever bit of this solution is splitting initial creation of the struct from proper initialization. This has the unfortunate downside that it's possible to use this struct incorrectly by initializing it and using it before calling freeze, but it can't result in memory unsafety without further usage of unsafe.

The initial creation of the struct only sets the stage for our later hackery - it creates the Ram, which has no dependencies, and sets the Cpu and Mmu to their unusable state, containing Cell::new(None) instead of the references they need.

Then, we call the freeze method, which deliberately holds a borrow of self with lifetime 'a, or the full lifetime of the struct. After we call this method, the compiler will prevent us from getting mutable references to the Computer or moving the Computer, as either could invalidate the reference that we are holding. The freeze method then sets up the Cpu and Mmu appropriately by setting the Cells to contain Some(&self.cpu) or Some(&self.ram) respectively.

After freeze is called our struct is ready to be used, but only immutably.

The Unsafe Way (Box<T> never moves T)

#![allow(dead_code)]

use std::mem;

// CRUCIAL INFO:
//
// In order for this scheme to be safe, Computer *must not*
// expose any functionality that allows setting the ram or
// mmu to a different Box with a different memory location.
//
// Care must also be taken to prevent aliasing of &mut references
// to mmu and ram. This is not a completely safe interface,
// and its use must be restricted.
struct Computer {
    ram: Box<Ram>,
    cpu: Cpu,
    mmu: Box<Mmu>,
}

struct Ram;

// Cpu and Mmu are unsafe to use directly, and *must only*
// be exposed when properly set up inside a Computer
struct Cpu {
    mmu: *mut Mmu,
}
struct Mmu {
    ram: *mut Ram,
}

impl Cpu {
    // Safe if we uphold the invariant that Cpu must be
    // constructed in a Computer.
    fn mmu(&self) -> &Mmu {
        unsafe { mem::transmute(self.mmu) }
    }
}

impl Mmu {
    // Safe if we uphold the invariant that Mmu must be
    // constructed in a Computer.
    fn ram(&self) -> &Ram {
        unsafe { mem::transmute(self.ram) }
    }
}

impl Computer {
    fn new() -> Computer {
        let ram = Box::new(Ram);

        let mmu = Box::new(Mmu {
            ram: unsafe { mem::transmute(&*ram) },
        });
        let cpu = Cpu {
            mmu: unsafe { mem::transmute(&*mmu) },
        };

        // Safe to move the components in here because all the
        // references are references to data behind a Box, so the
        // data will not move.
        Computer {
            ram: ram,
            mmu: mmu,
            cpu: cpu,
        }
    }
}

fn main() {}

Playground

NOTE: This solution is not completely safe given an unrestricted interface to Computer - care must be taken to not allow aliasing or removal of the Mmu or Ram in the public interface of Computer.

This solution instead uses the invariant that data stored inside of a Box will never move - it's address will never change - as long as the Box remains alive. Rust doesn't allow you to depend on this in safe code, since moving a Box can cause the memory behind it be deallocated, thereby leaving a dangling pointer, but we can rely on it in unsafe code.

The main trick in this solution is to use raw pointers into the contents of the Box<Mmu> and Box<Ram> to store references into them in the Cpu and Mmu respectively. This gets you a mostly safe interface, and doesn't prevent you from moving the Computer around or even mutating it in restricted cases.

An Ending Note

All of this said, I don't think either of these should really be the way you approach this problem. Ownership is a central concept in Rust, and it permeates the design choices of almost all code. If the Mmu owns the Ram and the Cpu owns the Mmu, that's the relationship you should have in your code. If you use Rc, you can even maintain the ability to share the underlying pieces, albeit immutably.

like image 114
reem Avatar answered Nov 05 '22 01:11

reem