I have my mind firmly wrapped around relational databases and how to code efficiently against them. Most of my experience is with MySQL and SQL. I like many of the things I'm hearing about document-based databases, especially when someone in a recent podcast mentioned huge performance benefits. So, if I'm going to go down that road, what are some of the mental steps I must take to shift from SQL to NO-SQL?
If it makes any difference in your answer, I'm a C# developer primarily (today, anyhow). I'm used to ORM's like EF and Linq to SQL. Before ORMs, I rolled my own objects with generics and datareaders. Maybe that matters, maybe it doesn't.
Here are some more specific:
(feel free to add questions of your own here)
SQL databases are efficient at processing queries and joining data across tables, making it easier to perform complex queries against structured data, including ad hoc requests. NoSQL databases lack consistency across products and typically require more work to query data, particular as query complexity increases.
It is clear that there is something happening in the database market and is hugely driven by the Big Data adoption. While other databases are not going to business anytime soon, there is a very bright future for NoSQL.
In short, using NoSQL databases is not difficult. The difficulty comes in using it for the right places in the right way. First of all, it is important to understand that NoSQL doesn't follow the same principles as Relational Databases such as fixed schemas, normalization, support for expressive queries like SQL.
NoSQL databases have become popular because they store data in simple straightforward forms that can be easier to understand than the type of data models used in SQL databases. In addition, NoSQL databases often allow developers to directly change the structure of the data.
Firstly, each NoSQL store is different. So it's not like choosing between Oracle or Sql Server or MySQL. The differences between them can be vast.
For example, with CouchDB you cannot execute ad-hoc queries (dynamic queries if you like). It is very good at online - offline scenarios, and is small enough to run on most devices. It has a RESTful interface, so no drivers, no ADO.NET libraries. To query it you use MapReduce (now this is very common across the NoSQL space, but not ubiquitous) to create views, and these are written in a number of languages, though most of the documentation is for Javascript. CouchDB is also designed to crash, which is to say if something goes wrong, it just restarts the process (the Erlang process, or group of linked processes that is, not the entire CouchDB instance typically).
MongoDB is designed to be highly performant, has drivers, and seems like less of a leap for a lot of people in the .NET world because of this. I believe though that in crash situations it is possible to lose data (it doesn't offer the same level of transactional guarantees around writes that CouchDB does).
Now both of these are document databases, and as such they share in common that their data is unstructured. There are no tables, no defined schema - they are schemaless. They are not like a key-value store though, as they do insist that the data you persist is intelligible to them. With CouchDB this means the use of JSON, and with MongoDB this means the use of BSON.
There are many other differences between MongoDB and CouchDB and these are considered in the NoSQL space to be very close in their design!
Other than document databases, their are network oriented solutions like Neo4J, columnar stores (column oriented rather than row oriented in how they persist data), and many others.
Something which is common across most NoSQL solutions, other than MapReduce, is that they are not relational databases, and that the majority do not make use of SQL style syntax. Typcially querying follows an imperative mode of programming rather than the declarative style of SQL.
Another typically common trait is that absolute consistency, as typically provided by relational databases, is traded for eventual models of consistency.
My advice to anyone looking to use a NoSQL solution would be to first really understand the requirements they have, understand the SLAs (what level of latency is required; how consistent must that latency remain as the solutions scales; what scale of load is anticipated; is the load consistent or will it spike; how consistent does a users view of the data need to be, should they always see their own writes when they query, should their writes be immediately visible to all other users; etc...). Understand that you can't have it all, read up on Brewers CAP theorum, which basically says you can't have absolute consistence, 100% availability, and be partition tolerant (cope when nodes can't communicate). Then look into the various NoSQL solutions and start to eliminate those which are not designed to meet your requirements, understand that the move from a relational database is not trivial and has a cost associated with it (I have found the cost of moving an organisation in that direction, in terms of meetings, discussions, etc... itself is very high, preventing focus on other areas of potential benefit). Most of the time you will not need an ORM (the R part of that equation just went missing), sometimes just binary serialisation may be ok (with something like DB4O for example, or a key-value store), things like the Newtonsoft JSON/BSON library may help out, as may automapper. I do find that working with C#3 theere is a definite cost compared to working with a dynamic language like, say Python. With C#4 this may improve a little with things like the ExpandoObject and Dynamic from the DLR.
To look at your 3 specific questions, with all it depends on the NoSQL solution you adopt, so no one answer is possible, however with that caveat, in very general terms:
If persisting the object (or aggregate more likely) as a whole, your joins will typically be in code, though you can do some of this through MapReduce.
Again, it depends, but with Couch you would execute a GET over HTTP against either a specific resource, or against a MapReduce view.
Most likely nothing. Just keep an eye-out for the serialisation, deserialisation scenarios. The difficulty I have found comes in how you manage versions of your code. If the property is purely for pushing to an interface (GUI, web service) then it tends to be less of an issue. If the property is a form of internal state which behaviour will rely on, then this can get more tricky.
Hope it helps, good luck!
Just stop thinking about the database.
Think about modeling your domain. Build your objects to solve the problem at hand following good patterns and practices and don't worry about persistence.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With