Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to store Linked Tree with children in Relational Database?

I have a custom LinkedTree with children (nodes), and nodes has an adjacency relationship, i.e. each node linked with previous and next one. This LinkedTree is very heavy, big, it may consist of millions of nodes.

And here is a code sample:

package tree;

import java.io.Serializable;

public class LinkedTree<E> implements Serializable {

    private int size = 0;
    private Node<E> first;
    private Node<E> last;
    private LinkedTree<E> children;

    public LinkedTree() {
        children = new LinkedTree<>();
    }

    public LinkedTree(LinkedTree<E> children) {
        this.children = children;
    }

    public void add(E element) {

        final Node<E> l = last;
        final Node<E> newNode = new Node<>(l, element, null);
        last = newNode;
        if (l == null)
            first = newNode;
        else
            l.next = newNode;
        size++;
    }

    public void remove(E element) {

        ...
    }

    public int size() {

         return size;
    }

    public static class Node<E> implements Serializable {

        E item;
        Node<E> next;
        Node<E> prev;

        public Node(Node<E> prev, E item, Node<E> next) {

            this.item = item;
            this.next = next;
            this.prev = prev;
        }

        public E item() {

            return item;
        }

        public boolean hasPrevious() {

            return prev != null;
        }

        public Node<E> previous() {

            return prev;
        }

        public Node<E> previous(int target) {

            Node<E> n = this;
            int i = target;
            while (i-- > 0 && n != null)
                n = n.prev;
            return n;
        }

        public boolean hasNext() {

            return next != null;
        }

        public Node<E> next() {

            return next;
        }

        public E nextItem() {

            return next.item;
        }

        public E nextItem(int target) {

            return next(target).item();
        }

        public Node<E> next(int target) {

            Node<E> n = this;
            int i = 0;
            while (i++ < target && n != null)
                n = n.next;
            return n;
        }

        @Override
        public int hashCode() {

            return item != null ? item.hashCode() : 0;
        }

        @Override
        public boolean equals(Object o) {

            if (this == o)
                return true;
            if (o == null || getClass() != o.getClass())
                return false;

            Node<?> node = (Node<?>) o;

            return item != null ? item.equals(node.item) : node.item == null;
        }

        @Override
        public String toString() {

            return item.toString();
        }
    }
}

I wanted to serialize it and save it in file in order to make it persistent, but loading and writing some data may cost too expensive. So I decided to save it in MySQL, so that I could load datas from anywhere I want. I mean from the end, middle or beginning of this hierarchy.

I suppose the relationship of the row should be in adjacency and parent-child relationship simultaneously. But I have no idea how to do it.

like image 868
Ste7 Avatar asked May 28 '18 14:05

Ste7


1 Answers

i would've commented asking for more information (particularly sample data & the rules of your hierarchy) so forgive this first cut for being overly general

i've made the following assumptions:

  • that your structure is greater than three levels deep - if this is not the case then i wouldn't do it this way, because it wouldn't be worth it

  • your load is read-heavy and writes to portions of the tree are concurrent but not conflicting, while conflicting writes are rare or non-existent

  • you don't require parallel, partial and shared-nothing or lock-free access to the tree to build or process it (in this case you need to signal deletes, which you could do by pointing to the node you replaced ie. superseded)

my proposed data model would look something like:

create table treemodel (
 `node` int not null 
 , `parent` int not null 
 , `principal` int not null 
 , `state` smallint unsigned not null 
 , ...
 , `supersedes` int    /*version instead of lossy update or delete*/
 , `supersededby` int
) engine = innodb;
alter table treemodel add primary key (`principal`, `node`) using btree; 
  1. the treemodel table would hold structural identifiers only: i would hold node data in a separate table but i would not join to get at it, rather, i would perform a second select ... where node in (...) - this is basically saying that 'the structure of my data is independent of my data'

  2. this model is designed to limit the number of round-trips to the database and may seem counter-intuitive, but will allow you to read or lock portions of the tree in a single database instruction with no joins

  3. this would run contrary to your data model, as you don't store an additional 'principal' node with your nested children - but if you can change this structure then you could take advantage of this proposal to avoid queries within loops, ie. multiple selects, or reentrant querying or self/ unary joins

  4. ... but your business logic would need to support the notion of what i have called the 'principal' node

  5. it's up to your use case as to what gets put in the principal node - i have used this model to store the causal relationship between observation records and their derivations, regardless of the parent child relationships below this point - another example might be: 1) a client raises a support case, or 2) a new message is sent, or 3) a new file is created, ...

  6. it would make no sense to store the principal as the actual root node in your tree structure (ie. node id '1', or 'data directory', or whatever) - instead you would store the 'next level down' for argument's sake, ie. 'user root directory' or 'first level below user root directory' - this is where knowing your use case & business rules will help

  7. ... and your java code will need to be updated to find or copy and store this concept - it will always be a copy from the parent node on any insert within a given branch of the tree - and if you are moving a branch and your logic requires a change to this number it is an update ... set principal=y where principal=x and update ... set parent=newid where node=current_principal

  8. ... and having said all of that, i wouldn't update the rows per se, only insert on change and recreate the whole tree branch (which explains the state field, ie. CURRENT, DELETED, ... where the root node of a deleted branch still points to its current parent node, eg. for 'deleted items')

  9. you still keep your pointers to each adjacent node in prev & next - ordered inserts of nodes into a branch of the tree would, in the worst case, require a select ... where principal=x for update, but probably only a select ... where (node=x or prev=x or next=x) for update

edits:

  • primary key/ clustered index must be unique - you could also partition on principal to ensure fast parallel access

  • recreating instead of updating branches

like image 60
brynk Avatar answered Oct 01 '22 12:10

brynk