Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the best way to inherit properties in a tree-based structure?

I have a simple CMS system, that has a simple tree hierarchy:

We have pages A through E that has the following hierarchy: A -> B -> C -> D -> E

All the pages are the same class, and have a parent-child relationship.

Now, let's say I have a property I want inherited among the pages. Let's say A is red: A (red) -> B -> C -> D -> E

In this case, B through E would inherit "red".

Or a more complex scenarios: A (red) -> B -> C (blue) -> D -> E

B would inherit red, and D/E would both be blue.

What would be the best way to solve something like this? I have a tree structure with over 6,000 leafs and about 100 of those leaves have inheritable properties. Those 100 or so leaves have their properties saved in the database. For leaves without explicit properties, I look up the ancestors and use memcached to save the properties. Then there are very overly-complex algorithms to handle expiring those caches. It's terribly convoluted and I'd like refactor to a more cleaner solution / data structure.

Does anybody have any ideas?

Thanks!

like image 607
chuboy Avatar asked Mar 11 '11 06:03

chuboy


1 Answers

There is a data model that allows you to express this kind of information perfectly, that is RDF/RDFS . RDF is a W3C standard to model data based on triples (subject, predicate, object) and URIs; and RDFS , among other things, allows you to describe Class hierarchies and Property hierarchies. And the good thing is that there are many libraries out there that help you to create and query this type of data.

For instance if I want to say that a specific document Lion is of class Animal and programmer is of class Geek , I could say:

doc:lion rdf:type class:mamal .
doc:programmer rdf:type class:Geek .

Now I could declare a hierarchy of classes, and say that every mamal is an animal and every animal is a living thing.

class:mamal rdfs:subClassOf class:animal .
class:animal rdfs:subClassOf class:LivingThing .

And, that every geek is a human and that every human is living thing:

class:geek rdfs:subClassOf class:human .
class:human rdfs:subClassOf class:LivingThing .

There is a language , similar to SQL, called SPARQL to query this kind of data, so for instance if I issue the query:

SELECT * WHERE {
       ?doc rdf:type class:LivingThing .
}

Where ?doc is a variable that will bind things type of class:LivingThing. I would get as result of this query doc:lion and doc:programmer because the database technology will follow the semantics of RDFS and therefore by computing the closure of classes it'll know that doc:lion and doc:programmer are class:LivingThing.

In the same way the query:

SELECT * WHERE {
       doc:lion rdf:type ?class .
}

Will tell me that doc:lion is rdf:type of class:mamal class:animal and class:LivingThing.

In the same way that as I just explained, with RDFS, you can create hierarchies of properties, and say:

doc:programmer doc:studies doc:computerscience .
doc:lion doc:instint doc:hunting .

And we can say that both properties doc:skill and doc:instint are sub-properties of doc:knows:

doc:studies rdfs:subPropertyOf doc:knows .
doc:instint rdfs:subPropertyOf doc:knows .

With the query:

SELECT * WHERE {
       ?s doc:knows ?o .
}

We will get that a lion knows how to hunt and programmers know computer science.

Most RDF/RDFS databases can easily deal with the numbers of elements you mentioned in your question, and there are many choices to start. If you are a Java person you could have a look at Jena, there are also frameworks for .Net lije this one or Python with RDFLIB

But most importantly, have a look at the documentation of your CMS, because maybe there are plugins to export metadata as RDF. Drupal, for instance, is quite advance in this case (see http://drupal.org/project/rdf

like image 198
Manuel Salvadores Avatar answered Oct 20 '22 03:10

Manuel Salvadores