Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Non-Relational Database Design [closed]

Tags:

database

nosql

I'm interested in hearing about design strategies you have used with non-relational "nosql" databases - that is, the (mostly new) class of data stores that don't use traditional relational design or SQL (such as Hypertable, CouchDB, SimpleDB, Google App Engine datastore, Voldemort, Cassandra, SQL Data Services, etc.). They're also often referred to as "key/value stores", and at base they act like giant distributed persistent hash tables.

Specifically, I want to learn about the differences in conceptual data design with these new databases. What's easier, what's harder, what can't be done at all?

  • Have you come up with alternate designs that work much better in the non-relational world?

  • Have you hit your head against anything that seems impossible?

  • Have you bridged the gap with any design patterns, e.g. to translate from one to the other?

  • Do you even do explicit data models at all now (e.g. in UML) or have you chucked them entirely in favor of semi-structured / document-oriented data blobs?

  • Do you miss any of the major extra services that RDBMSes provide, like relational integrity, arbitrarily complex transaction support, triggers, etc?

I come from a SQL relational DB background, so normalization is in my blood. That said, I get the advantages of non-relational databases for simplicity and scaling, and my gut tells me that there has to be a richer overlap of design capabilities. What have you done?

FYI, there have been StackOverflow discussions on similar topics here:

  • the next generation of databases
  • changing schemas to work with Google App Engine
  • choosing a document-oriented database
like image 599
Ian Varley Avatar asked Jul 27 '09 18:07

Ian Varley


People also ask

What is a characteristic of a non-relational database?

A non-relational database is a database that does not use the tabular schema of rows and columns found in most traditional database systems. Instead, non-relational databases use a storage model that is optimized for the specific requirements of the type of data being stored.

What is NoSQL database design?

NoSQL data modeling is an application-centric approach that is more concerned with how the application will query the data than the relationships within the data. NoSQL database design principles emphasize the flexibility of the data, as opposed to a rigid relational schema.

What is an example of a non-relational database?

NoSQL or non-relational databases examples:MongoDB, Apache Cassandra, Redis, Couchbase and Apache HBase. They are best for Rapid Application Development. NoSQL is the best selection for flexible data storage with little to no structure limitations.


2 Answers

I've only just started with non-relational DBs, and I am still trying to wrap my head around it and figure out what the best model would be. And I can only speak for CouchDB.

Still, I have some preliminary conclusions:

Have you come up with alternate designs that work much better in the non-relational world?

The design focus shifts: The design of the document model (corresponding to DB tables) becomes almost irrelevant, while everything hinges on designing the views (corresponding to queries).

The document DB sort of swaps the complexities: SQL has inflexible data and flexible queries, document DBs are the other way around.

The CouchDB model is a collection of "JSON documents" (basically nested hash tables). Each document has a unique ID, and can be trivially retrieved by ID. For any other query, you write "views", which are named sets of map/reduce functions. The views return a result set as a list of key/value pairs.

The trick is that you don't query the database in the sense you query an SQL database: The results of running the view functions are stored in an index, and only the index can be queried. (As "get everything", "get key" or "get key range".)

The closest analogy in the SQL world would be if you could only query the DB using stored procedures - every query you want to support must be predefined.

The design of the documents is enormously flexible. I have found only two constrains:

  • Keep related data together in the same document, since there is nothing corresponding to a join.
  • Don't make the documents so big that they are updated too frequently (like putting all company sales for the year in the same document), since every document update triggers a re-indexing.

But everything hinges on designing the views.

The alternate designs I have found that work orders of magnitude better with CouchDB than any SQL database are at the system level rather than the storage level. If you have some data and want to serve them to a web page, the complexity of the total system is reduced by at least 50%:

  • no designing DB tables (minor issue)
  • no ODBC/JDBC intermediate layer, all queries and transactions over http (moderate issue)
  • simple DB-to-object mapping from JSON, which is almost trivial compared to the same in SQL (important!)
  • you can potentially skip the entire application server, as you can design your documents to be retrieved directly by the browser using AJAX and add a little bit of JavaScript polishing before they are displayed as HTML. (HUGE!!)

For normal webapps, document/JSON-based DBs are a massive win, and the drawbacks of less flexible queries and some extra code for data validation seems a small price to pay.

Have you hit your head against anything that seems impossible?

Not yet. Map/reduce as a means of querying a database is unfamiliar, and requires a lot more thinking than writing SQL. There is a fairly small number of primitives, so getting the results you need is primarily a question of being creative with how you specify the keys.

There is a limitation in that queries cannot look at two or more documents at the same time - no joins or other kinds of multi-document relationships, but nothing so far has been insurmountable.

As an example limitation, counts and sums are easy but averages cannot be calculated by a CouchDB view/query. Fix: Return sum and count separately and compute the average on the client.

Have you bridged the gap with any design patterns, e.g. to translate from one to the other?

I'm not sure that's feasible. It's more of a complete redesign, like translating a functional style program to an object-oriented style. In general, there are far fewer document types than there are SQL tables and more data in each document.

One way to think of it is to look at your SQL for inserts and common queries: Which tables and columns are updated when a customer places an order, for instance? And which ones for monthly sales reports? That info should probably go in the same document.

That is: One document for Order, containing customer ID and product IDs, with replicated fields as necessary to simplify the queries. Anything within a document can be queried easily, anything that requires cross-referencing between say Order and Customer has to be done by the client. So if you want a report on sales by region, you should probably put a region code into the order.

Do you even do explicit data models at all now (e.g. in UML)?

Sorry, never did much UML before document DBs either :)

But you need some sort of model saying which fields belong in which documents and what kinds of values they contain. Both for your own reference later and to make sure that everybod using the DB knows the conventions. Since you no longer get an error if you store a date in a text field, for example, and anyone can add or remove any field they feel like, you need both validation code and conventions to pick up the slack. Especially if you work with external resources.

Do you miss any of the major extra services that RDBMSes provide?

Nope. But my background is web application developer, we deal with databases only to the extent that we must :)

A company I used to work for made a product (a webapp) that was designed to run across SQL databases from multiple vendors, and the "extra services" are so different from DB to DB that they had to be implemented separately for each DB. So it was less work for us to move the functionality out of the RDBMS. This even extended to fulltext search.

So whatever I am giving up is something I never really had in the first place. Obviously, your experience may differ.


A caveat: What I am working on now is a webapp for financial data, stock quotes and the like. This is a very good match for a document DB, from my point of view I get all the benefits of a DB (persistence and queries) without any of the hassle.

But these data are fairly independent of each other, there are no complex relational queries. Get latest quotes by ticker, get quotes by ticker and date range, get company meta-info, that's pretty much all of it. Another example I saw was a blog application, and blogs are not characterized by massively complicated database schemas either.

What I am trying to say is that all the successful applications of document DBs I know of have been with data that didn't have much interrelations in the first place: Documents (as in Google search), blog posts, news articles, financial data.

I expect that there are datasets that map better to SQL than to the document model, so I imagine SQL will survive.

But for those of us that just want a simple way to store and retrieve data - and I suspect that there are many of us - document databases (as in CouchDB) are a godsend.

like image 166
j-g-faustus Avatar answered Sep 23 '22 15:09

j-g-faustus


I think you have to consider that the non-relational DBMS differ a lot regarding their data model and therefore the conceptual data design will also differ a lot. In the thread Data Design in Non-Relational Databases of the NOSQL Google group the different paradigms are categorized like this:

  1. Bigtable-like systems (HBase, Hypertable, etc)
  2. Key-value stores (Tokyo, Voldemort, etc)
  3. Document databases (CouchDB, MongoDB, etc)
  4. Graph databases (AllegroGraph, Neo4j, Sesame, etc)

I'm mostly into graph databases, and the elegance of data design using this paradigm was what brought me there, tired of the shortcomings of RDBMS. I have put a few examples of data design using a graph database on this wiki page and there's an example of how to model the basic IMDB movie/actor/role data too.

The presentation slides (slideshare) Graph Databases and the Future of Large-Scale Knowledge Management by Marko Rodriguez contains a very nice introduction to data design using a graph database as well.

Answering the specific questions from a graphdb point of view:

Alternate design: adding relationships between many different kinds of entities without any worries or a need to predefine which entities can get connected.

Bridging the gap: I tend to do this different for every case, based on the domain itself, as I don't want a "table-oriented graph" and the like. However, here's some information on automatic translation from RDBMS to graphdb.

Explicit data models: I do these all the time (whiteboard style), and then use the model as it is in the DB as well.

Miss from RDBMS world: easy ways to create reports. Update: maybe it's not that hard to create reports from a graph database, see Creating a Report for a Neo4J Sample Database.

like image 44
nawroth Avatar answered Sep 19 '22 15:09

nawroth