Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DataBase (datamodel) to build a folder structure

Planning on building a Folder based structure in Java.

I will be using a jquery plugin for the GUI, so I don't need need information on how to display the folder structure.

I am looking for the backend logic on how folder information is stored, such that it can be retrieved in a quick and efficient manner.

Each folder will have multiple subfolders. From a leaf folder, we should be able access the root quickly and efficiently

Example:

+Folder1
  |__SubFolder1_1
  |__SubFolder1_2
        |_SubSubFolder1_2_1
        |_
+Folder2
  |__SubFolder2_1
        |_SubFolder2_1_1
        |_SubFolder2_1_2
             |_SubFolder2_1_2_1

New folders could be added randomly. Folder can be renamed. Folders can be deleted.

My Question is:

How would these folder details be stored in the database?

Again, I am looking for a fast and efficient means of storing and retrieving this information.

like image 738
kensen john Avatar asked Sep 23 '11 03:09

kensen john


1 Answers

This is good question, but without a lot of specifics, it is hard to talk about the 'best' solution.

You can map this to the abstract question of how to store an n-ary tree in a relational database.

Here are some of the variables that affect the problem:

  1. What is the total size of the directory structure?
  2. How many separate VMs perform writes to the structure?
  3. Are move operations frequent?
  4. Is faulting an entire subtree an important operation too?
  5. Does your database support tree walks, or do you need a solution that works with any reasonable relational database?

The following assumes your database doesn't have special provisions for performing tree walks.

There are two pure persistence models for n-ary trees.

The first is to simply write each node with a parent reference:

| NodeId | ParentId | Name       | ....
|--------|----------|------------|-----

This approach simplifies the move of a folder, but deletes, queries for all nested subfolders and finding root become expensive.

The second pure model is to persist every ancestral relationship separate from the folder details

| NodeId | Name     | ....
|--------|----------|------
...


| NodeId | AncestorId | Distance | 
|--------|------------|----------|
...

Here, the folder /food/dairy/cheese/cheddar would produce

| NodeId | Name     |
|--------|----------|
| #0     | (root)   |
| #1     | food     |
| #2     | dairy    |
| #3     | cheese   |
| #4     | cheddar  |


| NodeId | AncestorId | Distance |
|--------|------------|----------|
| #1     | #0         | 1        |
| #2     | #0         | 2        |
| #2     | #1         | 1        |
| #3     | #0         | 3        |
| #3     | #1         | 2        |
| #3     | #2         | 1        |
| #4     | #0         | 4        |
| #4     | #1         | 3        |
| #4     | #2         | 2        |
| #4     | #3         | 1        |

This approach is very expensive for moves, and a new directory causes d inserts, where d is the distance from the root. But a subtree listing is a single query. The ancestry path is also a single query; an order by Distance desc will let you get to the root and first folder quickly.

But narrowly reading your question, a variant of the first approach, simply adding root as well might be the right approach for you:

| NodeId | ParentId | RootId | Name       | ....
|--------|----------|--------|------------|-----

Note that moving a folder will be expensive, because you need to determine all nested subfolders, and update all those records' RootId.

like image 138
Dilum Ranatunga Avatar answered Oct 31 '22 08:10

Dilum Ranatunga