Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How would you model a generic Schema.org Storage

I'm looking for the best way to model an application around the whole schema.org stuff. The Schema.org Hierarchie contains now around 500 different Types which can be used to markup microdata on a website: http://schema.org/docs/full.html

The goal is to build a generic system around all of those Things, without modeling the 500+ different tables using Default SQL Databases.

As a starting example the JobPosting seems quite simple to model as it just have some fields and just two links to Organisation and Place Objects: see http://schema.org/JobPosting

Which Database System (SQL, MongoDB, Cassandra, neo4J, Sesame, ...) would you suggest to model this kind of Data? There are even some special Graph or RDF Databases which may be another option.

Bonus Question: Another Problem which blows my mind at the moment, is the Multiple Inheritance which some objects are based on, e.g. http://schema.org/Dentist is a LocalBusiness Organization but also a Place, so it has fields from several different parents.

So I'm looking for a System with:

  • Variable Columns as I don't want to model those zillion of attributes using SQL-DDL
  • Multiple Inheritance or something like this (Mixins)
  • Useful Link betweens Records (like a JobPosting points to the Organization and the Place it belongs to)
  • Simple Queries (like, getting all JobPosting for a given Organization)

Please let me know what kind of information would help to find a better answer.

like image 667
Severin Ulrich Avatar asked Jan 13 '12 10:01

Severin Ulrich


1 Answers

I think MongoDB can be a good fit, because its documents make it easier to represent the individual schemas. (solves the variable column problem).

To solve the linking, it makes sense to store references only. For example, in the JobPosting, you probably want to store an OrganizationId and a PlaceId, because these are fairly complex documents. This also makes querying a certain organization's JobPostings trivial.

Note Sometimes, embedding might be more appropriate, but that depends heavily on the way your documents are updated. In particular, many objects might refer to the same address, so a change in address should be reflected everywhere. Sometimes, the opposite is true. This is a key question that can be answered only by you. It depends on how the system is used.

In any case, the linking means that a single lookup might have to traverse a tree of references. Again, this depends greatly on the use case:

Suppose you want to display a JobPosting. Now you could display a list of properties, and for "Organization" all you print is "ACME, Inc." with a link. That link will send you to the details page of "ACME, Inc." In this case, your queries are very straightforward. The only thing you need to do is to copy the organization name to the JobPosting (de-normalization), so it's easier to display.

If, on the other hand, you want to display everything in-place, you will have to perform more queries and build the domain model object in code. This is not a big deal, but requires additional care in case of circular references and the like.

I guess the best approach is to use the respective most specific type as a collection name (so a ContactPoint ends up in the ContactPoint collection, a PostalAddress in the PostalAddress collection, etc.).

The only remaining problem is multiple inheritance or mixins. I haven't used ruby before, but I guess the mongodb ruby driver supporting mixins.

You will still have to cope with indexing and the like, but again, this depends greatly on the use cases. You probably want to index most foreign keys, but additional indexes will need manual care.

like image 149
mnemosyn Avatar answered Sep 19 '22 10:09

mnemosyn