At work, we recently started a project using CouchDB (a document-oriented database). I've been having a hard time un-learning all of my relational db knowledge.
I was wondering how some of you overcame this obstacle? How did you stop thinking relationally and start think documentally (I apologise for making up that word).
Any suggestions? Helpful hints?
Edit: If it makes any difference, we're using Ruby & CouchPotato to connect to the database.
Edit 2: SO was hassling me to accept an answer. I chose the one that helped me learn the most, I think. However, there's no real "correct" answer, I suppose.
I think, after perusing about on a couple of pages on this subject, it all depends upon the types of data you are dealing with.
RDBMSes represent a top-down approach, where you, the database designer, assert the structure of all data that will exist in the database. You define that a Person has a First,Last,Middle Name and a Home Address, etc. You can enforce this using a RDBMS. If you don't have a column for a Person's HomePlanet, tough luck wanna-be-Person that has a different HomePlanet than Earth; you'll have to add a column in at a later date or the data can't be stored in the RDBMS. Most programmers make assumptions like this in their apps anyway, so this isn't a dumb thing to assume and enforce. Defining things can be good. But if you need to log additional attributes in the future, you'll have to add them in. The relation model assumes that your data attributes won't change much.
"Cloud" type databases using something like MapReduce, in your case CouchDB, do not make the above assumption, and instead look at data from the bottom-up. Data is input in documents, which could have any number of varying attributes. It assumes that your data, by its very definition, is diverse in the types of attributes it could have. It says, "I just know that I have this document in database Person that has a HomePlanet attribute of "Eternium" and a FirstName of "Lord Nibbler" but no LastName." This model fits webpages: all webpages are a document, but the actual contents/tags/keys of the document vary soo widely that you can't fit them into the rigid structure that the DBMS pontificates from upon high. This is why Google thinks the MapReduce model roxors soxors, because Google's data set is so diverse it needs to build in for ambiguity from the get-go, and due to the massive data sets be able to utilize parallel processing (which MapReduce makes trivial). The document-database model assumes that your data's attributes may/will change a lot or be very diverse with "gaps" and lots of sparsely populated columns that one might find if the data was stored in a relational database. While you could use an RDBMS to store data like this, it would get ugly really fast.
To answer your question then: you can't think "relationally" at all when looking at a database that uses the MapReduce paradigm. Because, it doesn't actually have an enforced relation. It's a conceptual hump you'll just have to get over.
A good article I ran into that compares and contrasts the two databases pretty well is MapReduce: A Major Step Back, which argues that MapReduce paradigm databases are a technological step backwards, and are inferior to RDBMSes. I have to disagree with the thesis of the author and would submit that the database designer would simply have to select the right one for his/her situation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With