Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Storing Directory Hierarchy in a Key-Value Data store

What is a clean/efficient method for storing the directory Hierarchy/tree in a Key-Value database (in my case MongoDB but any of them)?

For example a tree structure

- Cars     + Audi     + BMW       - M5    + Ford - Color    + Red       - Apple       - Cherry    + Purple - Funny 

The method I am using now, each object links to it's parent

{    dir: "red"   parent-dir: "color" } 

This makes it very efficient/fast to insert and reorder any aspect of the tree (for example if I want to move Red and all it's children to the Cars directory).

But this method sucks when I want to all subdirectories and their children for a given directory recursively. To make it efficient to parse I can have a structure for example

{    dir: "red"   children: "audi, bmw, ford" }  {    dir: "bmw"   children: "m5" } 

But if I want to modify the tree, a whole bunch of objects need to touched and modified.

Are there any other methods to storing a directory structure in a KV store?

like image 308
The Unknown Avatar asked Oct 24 '09 20:10

The Unknown


People also ask

Which type of database is best for hierarchical data storage?

Document based database like MongoDB, and Redis are great for small scale, hierarchical data with a relatively small amount of children for each entry.

What kinds of data can you store in a key-value database?

A key-value database is a type of nonrelational database that uses a simple key-value method to store data. A key-value database stores data as a collection of key-value pairs in which a key serves as a unique identifier. Both keys and values can be anything, ranging from simple objects to complex compound objects.

What is the data structure used for store key-value pair?

A key–value database, or key–value store, is a data storage paradigm designed for storing, retrieving, and managing associative arrays, and a data structure more commonly known today as a dictionary or hash table.


2 Answers

The method you currently use now is called adjacency list model.

Another model to store hierarchical data in a (relational) database is the nested set model. Its implementation in SQL databases is well known. Also see this article for the modified preorder tree traversal algorithm.

A very simple method: you could store a path per object - with those it should be easy to query trees in NOSQL databases:

{ path: "Color", ... } { path: "Color.Red", ... } { path: "Color.Red.Apple", ... } { path: "Color.Red.Cherry", ... } 

When nodes will be removed or renamed some paths must be updated. But in general, this method looks promising. You just have to reserve a special character as separator. The storage space overhead should be negligible.

edit: this method is called materialized path

Finally, here is a comparison of different methods for hierarchical data in NOSQL databases.

like image 149
Frunsi Avatar answered Sep 22 '22 05:09

Frunsi


I don't have a huge amount of NOSQL experience, so this isn't a definitive answer, but here's how I'd approach it:

I would likely use your first approach, where you have:

{   dir: 'dir_name',   parent_dir: 'parent_dir_name' } 

And then set up a map-reduce to quickly query the children of a directory. MongoDB's map-reduce functionality is still only available in the development branch and I haven't worked with it yet, but in CouchDB (and I assume, with a few modification, in MongoDB) you could do something like:

map: function(doc) {   emit( doc.parent_dir, doc.dir ); }  reduce: function(key, values) {   return( values ); } 

Which would give you the list of sub-directories for each parent directory.

like image 43
Emily Avatar answered Sep 21 '22 05:09

Emily