I have a simple CMS system, that has a simple tree hierarchy:
We have pages A through E that has the following hierarchy: A -> B -> C -> D -> E
All the pages are the same class, and have a parent-child relationship.
Now, let's say I have a property I want inherited among the pages. Let's say A is red: A (red) -> B -> C -> D -> E
In this case, B through E would inherit "red".
Or a more complex scenarios: A (red) -> B -> C (blue) -> D -> E
B would inherit red, and D/E would both be blue.
What would be the best way to solve something like this? I have a tree structure with over 6,000 leafs and about 100 of those leaves have inheritable properties. Those 100 or so leaves have their properties saved in the database. For leaves without explicit properties, I look up the ancestors and use memcached to save the properties. Then there are very overly-complex algorithms to handle expiring those caches. It's terribly convoluted and I'd like refactor to a more cleaner solution / data structure.
Does anybody have any ideas?
Thanks!
There is a data model that allows you to express this kind of information perfectly, that is RDF/RDFS . RDF is a W3C standard to model data based on triples (subject, predicate, object) and URIs; and RDFS , among other things, allows you to describe Class hierarchies and Property hierarchies. And the good thing is that there are many libraries out there that help you to create and query this type of data.
For instance if I want to say that a specific document Lion
is of class Animal
and programmer
is of class Geek
, I could say:
doc:lion rdf:type class:mamal .
doc:programmer rdf:type class:Geek .
Now I could declare a hierarchy of classes, and say that every mamal is an animal and every animal is a living thing.
class:mamal rdfs:subClassOf class:animal .
class:animal rdfs:subClassOf class:LivingThing .
And, that every geek is a human and that every human is living thing:
class:geek rdfs:subClassOf class:human .
class:human rdfs:subClassOf class:LivingThing .
There is a language , similar to SQL
, called SPARQL to query this kind of data, so for instance if I issue the query:
SELECT * WHERE {
?doc rdf:type class:LivingThing .
}
Where ?doc
is a variable that will bind things type of class:LivingThing
. I would get as result of this query doc:lion
and doc:programmer
because the database technology will follow the semantics of RDFS and therefore by computing the closure of classes it'll know that doc:lion
and doc:programmer
are class:LivingThing
.
In the same way the query:
SELECT * WHERE {
doc:lion rdf:type ?class .
}
Will tell me that doc:lion
is rdf:type
of class:mamal
class:animal
and class:LivingThing
.
In the same way that as I just explained, with RDFS, you can create hierarchies of properties, and say:
doc:programmer doc:studies doc:computerscience .
doc:lion doc:instint doc:hunting .
And we can say that both properties doc:skill
and doc:instint
are sub-properties of doc:knows
:
doc:studies rdfs:subPropertyOf doc:knows .
doc:instint rdfs:subPropertyOf doc:knows .
With the query:
SELECT * WHERE {
?s doc:knows ?o .
}
We will get that a lion knows how to hunt and programmers know computer science.
Most RDF/RDFS databases can easily deal with the numbers of elements you mentioned in your question, and there are many choices to start. If you are a Java person you could have a look at Jena, there are also frameworks for .Net lije this one or Python with RDFLIB
But most importantly, have a look at the documentation of your CMS, because maybe there are plugins to export metadata as RDF. Drupal, for instance, is quite advance in this case (see http://drupal.org/project/rdf
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With