Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

An alternative to hierarchical data model

Problem domain

I'm working on a rather big application, which uses a hierarchical data model. It takes images, extracts images' features and creates analysis objects on top of these. So the basic model is like Object-(1:N)-Image_features-(1:1)-Image. But the same set of images may be used to create multiple analysis objects (with different options).

Then an object and image can have a lot of other connected objects, like the analysis object can be refined with additional data or complex conclusions (solutions) can be based on the analysis object and other data.

Current solution

This is a sketch of the solution. Stacks represent sets of objects, arrows represent pointers (i.e. image features link to their images, but not vice versa). Some parts: images, image features, additional data, may be included in multiple analysis objects (because user wants to make analysis on different sets of object, combined differently).

Current solution simplified sketch

Images, features, additional data and analysis objects are stored in global storage (god-object). Solutions are stored inside analysis objects by means of composition (and contain solution features in turn).

All the entities (images, image features, analysis objects, solutions, additional data) are instances of corresponding classes (like IImage, ...). Almost all the parts are optional (i.e., we may want to discard images after we have a solution).

Current solution drawbacks

  1. Navigating this structure is painful, when you need connections like the dotted one in the sketch. If you have to display an image with a couple of solutions features on top, you first have to iterate through analysis objects to find which of them are based on this image, and then iterate through the solutions to display them.
  2. If to solve 1. you choose to explicitly store dotted links (i.e. image class will have pointers to solution features, which are related to it), you'll put very much effort maintaining consistency of these pointers and constantly updating the links when something changes.

My idea

I'd like to build a more extensible (2) and flexible (1) data model. The first idea was to use a relational model, separating objects and their relations. And why not use RDBMS here - sqlite seems an appropriate engine to me. So complex relations will be accessible by simple (left)JOIN's on the database: pseudocode "images JOIN images_to_image_features JOIN image_features JOIN image_features_to_objects JOIN objects JOIN solutions JOIN solution_features") and then fetching actual C++ objects for solution features from global storage by ID.

The question

So my primary question is

  • Is using RDBMS an appropriate solution for problems I described, or it's not worth it and there are better ways to organize information in my app?

If RDBMS is ok, I'd appreciate any advice on using RDBMS and relational approach to store C++ objects' relationships.

like image 403
Steed Avatar asked Aug 20 '12 12:08

Steed


4 Answers

You may want to look at Semantic Web technologies, such as RDF, RDFS and OWL that provide an alternative, extensible way of modeling the world. There are some open-source triple stores available, and some of the mainstream RDBMS also have triple store capabilities.

In particular take a look at Manchester Universities Protege/OWL tutorial: http://owl.cs.manchester.ac.uk/tutorials/protegeowltutorial/

And if you decide this direction is worth looking at further, I can recommend "SEMANTIC WEB for the WORKING ONTOLOGIST"

like image 152
Seb Rose Avatar answered Oct 27 '22 00:10

Seb Rose


Just based on the diagram, I would suggest that an RDBMS solution would indeed work. It has been years since I was a developer on an RDMS (called RDM, of course!), but I was able to renew my knowledge and gain very many valuable insights into data structure and layout very similar to what you describe by reading the fabulous book "The Art of SQL" by Stephane Faroult. His book will go a long way to answer your questions.

I've included a link to it on Amazon, to ensure accuracy: http://www.amazon.com/The-Art-SQL-Stephane-Faroult/dp/0596008945

You will not go wrong by reading it, even if in the end it does not solve your problem fully, because the author does such a great job of breaking down a relation in clear terms and presenting elegant solutions. The book is not a manual for SQL, but an in-depth analysis of how to think about data and how it interrelates. Check it out!

Using an RDBMS to track the links between data can be an efficient way to store and think about the analysis you are seeking, and the links are "soft" -- that is, they go away when the hard objects they link are deleted. This ensures data integrity; and Mssr Fauroult can answer what to do to ensure that remains true.

like image 41
shipr Avatar answered Oct 27 '22 01:10

shipr


I don't recommend RDBMS based on your requirement for an extensible and flexible model.

  1. Whenever you change your data model, you will have to change DB schema and that can involve more work than change in code.
  2. Any problems with DB queries are discovered only at runtime. This can make a lot of difference to the cost of maintenance.

I strongly recommend using standard C++ OO programming with STL.

  1. You can make use of encapsulation to ensure any data change is done properly, with updates to related objects and indexes.
  2. You can use STL to build highly efficient indexes on the data
  3. You can create facades to get you the information easily, rather than having to go to multiple objects/collections. This will be one-time work
  4. You can make unit test cases to ensure correctness (much less complicated compared to unit testing with databases)
  5. You can make use of polymorphism to build different kinds of objects, different types of analysis etc

All very basic points, but I reckon your effort would be best utilized if you improve the current solution rather than by look for a DB based solution.

like image 31
Sameer Avatar answered Oct 27 '22 01:10

Sameer


http://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/index.html

"you'll put very much effort maintaining consistency of these pointers and constantly updating the links when something changes."

With the help of Boost.MultiIndex you can create almost every kind of index on a "table". I think the quoted problem is not so serious, so the original solution is manageable.

like image 21
Industrial-antidepressant Avatar answered Oct 26 '22 23:10

Industrial-antidepressant