Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In memory database design

Tags:

rust

I am trying to create an in-memory database using HashMap. I have a struct Person:

struct Person {
    id: i64,
    name: String,
}

impl Person {
    pub fn new(id: i64, name: &str) -> Person {
        Person {
            id: id,
            name: name.to_string(),
        }
    }

    pub fn set_name(&mut self, name: &str) {
        self.name = name.to_string();
    }
}

And I have struct Database:

use std::collections::HashMap;
use std::sync::Arc;
use std::sync::Mutex;

struct Database {
    db: Arc<Mutex<HashMap<i64, Person>>>,
}

impl Database {
    pub fn new() -> Database {
        Database {
            db: Arc::new(Mutex::new(HashMap::new())),
        }
    }

    pub fn add_person(&mut self, id: i64, person: Person) {
        self.db.lock().unwrap().insert(id, person);
    }

    pub fn get_person(&self, id: i64) -> Option<&mut Person> {
        self.db.lock().unwrap().get_mut(&id)
    }
}

And code to use this database:

let mut db = Database::new();
db.add_person(1, Person::new(1, "Bob"));

I want to change person's name:

let mut person = db.get_person(1).unwrap();
person.set_name("Bill");

The complete code in the Rust playground.

When compiling, I get a problem with Rust lifetimes:

error[E0597]: borrowed value does not live long enough
  --> src/main.rs:39:9
   |
39 |         self.db.lock().unwrap().get_mut(&id)
   |         ^^^^^^^^^^^^^^^^^^^^^^^ temporary value does not live long enough
40 |     }
   |     - temporary value only lives until here
   |
note: borrowed value must be valid for the anonymous lifetime #1 defined on the method body at 38:5...
  --> src/main.rs:38:5
   |
38 | /     pub fn get_person(&self, id: i64) -> Option<&mut Person> {
39 | |         self.db.lock().unwrap().get_mut(&id)
40 | |     }
   | |_____^

How to implement this approach?

like image 817
crocus Avatar asked Dec 07 '22 20:12

crocus


2 Answers

The compiler rejects your code because it violates the correctness model enforced by Rust and could cause crashes. For one, if get_person() were allowed to compile, one might call it from two threads and modify the underlying object without the protection of the mutex, causing data races on the String object inside. Worse, one could wreak havoc even in a single-threaded scenario by doing something like:

let mut ref1 = db.get_person(1).unwrap();
let mut ref2 = db.get_person(1).unwrap();
// ERROR - two mutable references to the same object!

let vec: Vec<Person> = vec![];
vec.push(*ref1);  // move referenced object to the vector
println!(*ref2);  // CRASH - object already moved

To correct the code, you need to adjust your design to satisfy the following constraints:

  • No reference can be allowed to outlive the referred-to object;
  • During the lifetime of a mutable reference, no other reference (mutable or immutable) to the object may exist..

The add_person method already complies with both rules because it eats the object you pass it, moving it to the database.

What if we modified get_person() to return an immutable reference?

pub fn get_person(&self, id: i64) -> Option<&Person> {
    self.db.lock().unwrap().get(&id)
}

Even this seemingly innocent version still doesn't compile! That is because it violates the first rule. Rust cannot statically prove that the reference will not outlive the database itself, since the database is allocated on the heap and reference-counted, so it can be dropped at any time. But even if it were possible to somehow explicitly declare the lifetime of the reference to one that provably couldn't outlive the database, retaining the reference after unlocking the mutex would allow data races. There is simply no way to implement get_person() and still retain thread safety.

A thread-safe implementation of a read can opt to return a copy of the data. Person can implement the clone() method and get_person() can invoke it like this:

#[derive(Clone)]
struct Person {
    id: i64,
    name: String
}
// ...

pub fn get_person(&self, id: i64) -> Option<Person> {
    self.db.lock().unwrap().get(&id).cloned()
}

This kind of change won't work for the other use case of get_person(), where the method is used for the express purpose of obtaining a mutable reference to change the person in the database. Obtaining a mutable reference to a shared resource violates the second rule and could lead to crashes as shown above. There are several ways to make it safe. One is by providing a proxy in the database for setting each Person field:

pub fn set_person_name(&self, id: i64, new_name: String) -> bool {
    match self.db.lock().unwrap().get_mut(&id) {
        Some(mut person) => {
            person.name = new_name;
            true
        }
        None => false
    }
}

As the number of fields on Person grows, this would quickly get tedious. It could also get slow, as a separate mutex lock would have to be acquired for each access.

There is fortunately a better way to implement modification of the entry. Remember that using a mutable reference violates the rules unless Rust can prove that the reference won't "escape" the block where it is being used. This can be ensured by inverting the control - instead of a get_person() that returns the mutable reference, we can introduce a modify_person() that passes the mutable reference to a callable, which can do whatever it likes with it. For example:

pub fn modify_person<F>(&self, id: i64, f: F) where F: FnOnce(Option<&mut Person>) {
    f(self.db.lock().unwrap().get_mut(&id))
}

The usage would look like this:

fn main() {
    let mut db = Database::new();

    db.add_person(1, Person::new(1, "Bob"));
    assert!(db.get_person(1).unwrap().name == "Bob");

    db.modify_person(1, |person| {
        person.unwrap().set_name("Bill");
    });
}

Finally, if you're worried about the performance of get_person() cloning Person for the sole reason of inspecting it, it is trivial to create an immutable version of modify_person that serves as a non-copying alternative to get_person():

pub fn read_person<F, R>(&self, id: i64, f: F) -> R
    where F: FnOnce(Option<&Person>) -> R {
    f(self.db.lock().unwrap().get(&id))
}

Besides taking a shared reference to Person, read_person is also allowing the closure to return a value if it chooses, typically something it picks up from the object it receives. Its usage would be similar to the usage of modify_person, with the added possibility of returning a value:

// if Person had an "age" field, we could obtain it like this:
let person_age = db.read_person(1, |person| person.unwrap().age);

// equivalent to the copying definition of db.get_person():
let person_copy = db.read_person(1, |person| person.cloned());
like image 82
user4815162342 Avatar answered Jan 21 '23 14:01

user4815162342


This post use the pattern cited as "inversion of control" in the well explained answer and just add only sugar for demonstrating another api for an in-memory db.

With a macro rule it is possible to expose a db client api like that:

fn main() {
    let db = Database::new();

    let person_id = 1234;

    // probably not the best design choice to duplicate the person_id,
    // for the purpose here is not important 
    db.add_person(person_id, Person::new(person_id, "Bob"));

    db_update!(db #person_id => set_name("Gambadilegno"));

    println!("your new name is {}",  db.get_person(person_id).unwrap().name);
}

My opinionated macro has the format:

<database_instance> #<object_key> => <method_name>(<args>)

Below the macro implementation and the full demo code:

use std::collections::HashMap;
use std::sync::Arc;
use std::sync::Mutex;

macro_rules! db_update {
    ($db:ident # $id:expr => $meth:tt($($args:tt)*)) => {
        $db.modify_person($id, |person| {
            person.unwrap().$meth($($args)*);
        });
    };
}

#[derive(Clone)]
struct Person {
    id: u64,
    name: String,
}

impl Person {
    pub fn new(id: u64, name: &str) -> Person {
        Person {
            id: id,
            name: name.to_string(),
        }
    }

    fn set_name(&mut self, value: &str) {
        self.name = value.to_string();
    }
}

struct Database {
    db: Arc<Mutex<HashMap<u64, Person>>>, // access from different threads
}

impl Database {
    pub fn new() -> Database {
        Database {
            db: Arc::new(Mutex::new(HashMap::new())),
        }
    }

    pub fn add_person(&self, id: u64, person: Person) {
        self.db.lock().unwrap().insert(id, person);
    }

    pub fn modify_person<F>(&self, id: u64, f: F)
    where
        F: FnOnce(Option<&mut Person>),
    {
        f(self.db.lock().unwrap().get_mut(&id));
    }

    pub fn get_person(&self, id: u64) -> Option<Person> {
        self.db.lock().unwrap().get(&id).cloned()
    }
}
like image 39
attdona Avatar answered Jan 21 '23 13:01

attdona