I'm starting a personal project that involves storing a large database of objects and the relationships between objects. I chose Hadoop and HBase because it will need to be multi node and much of the data is sparse.
Coming from an RDBMS world I spent a lot of time reading over HBase's column oriented structure and given the current documentation I'm having trouble figuring out how to store objects and relationships between objects.
The objects themselves can have unlimited number of relationships with other objects, and an unlimited number of arbitrary attributes. Relationships can also have attributes. My goal is to have two "Person" objects that are linked by a "Married" relationship, and the Married relationship has an attribute "Date", I would like to (in the future) be able to write a MapReduce to quickly find all persons married between x and y.
There are 2 steps to follow (according to me).
If the search result can wait for a MapReduce to finish then its fine, but if you need more swift results, I would and actually am using another tool for all sort of searching, e.g. Elastic Search, Apache Solr, Apache Lucene, etc. Range queries are pretty easy in search tools such as Solr and the result will be faster than a MapReduce. Another reason to choose search tools is to get sort order as required.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With