Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Patterns for Schema Changes in Document Databases

before I start I'd like to apologize for the rather generic type of my questions - I am sure a whole book could be written on that particular topic.

Lets assume you have a big document database with multiple document schemas and millions of documents for each of these schemas. During the life time of the application the need arises to change the schema (and content) of the already stored documents frequently.

Such changes could be

  • adding new fields
  • recalculating field values (split Gross into Net and VAT)
  • drop fields
  • move fields into an embedded document

I my last project where we used a SQL DB we had some very similar challanges which resulted in some significant offline time (for a 24/7 product) when the changes became to drastic as SQL DBs usually do a LOCK on a table when changes occur. I want to avoid such a scenario.

Another related question is how to handle schema changes from within the used programming language environment. Usually schema changes happen by changing the Class definition (I will be using Mongoid a OR-Mapper for MongoDB and Ruby). How do I handle old versions of documents that do not conform any more to my latest Class definition.

like image 511
robkuz Avatar asked Feb 17 '11 13:02

robkuz


People also ask

What are database schema changes?

A schema change is an alteration made to a collection of logical structures (or schema objects) in a database. Schema changes are generally made using structured query language (SQL) and are typically implemented during maintenance windows.

What are the 3 types of schema in the database?

Schema is of three types: Logical Schema, Physical Schema and view Schema. Logical Schema – It describes the database designed at logical level. Physical Schema – It describes the database designed at physical level. View Schema – It defines the design of the database at the view level.

Does database schema changes very frequently?

Database schema changes are not popular among DBAs, not when you are operating production databases and cannot afford to switch off the service during a maintenance window. These are unfortunately frequent and necessary, especially when introducing new features to existing applications.

How do you edit a database schema?

To change the schema of a table by using SQL Server Management Studio, in Object Explorer, right-click on the table and then click Design. Press F4 to open the Properties window. In the Schema box, select a new schema. ALTER SCHEMA uses a schema level lock.


1 Answers

That is a very good question.

The good part of document oriented databases as MongoDB is that documents from the same collection doesn't need to have the same fields. Having different fields do not raise an error, per se. It's called flexibility. It also a bad part, for the same reasons.

So the problem and also the solution comes from the logic of your application.

Let say we have a model Person and we want to add a field. Currently in the database we have 5.000.000 people saved. The problem is: How do we add that field and have the less downtime?

Possible solution:

  1. Change the logic of the application so that it can cope with both a person with that field and a person without that field.

  2. Write a task that add that field to each person in the database.

  3. Update the production deployment with the new logic.

  4. Run the script.

So the only downtime is the few seconds that it takes to redeploy. Nonetheless, we need to spend time with the logic.

So basically we need to choose which is more valuable the uptime or our time.

Now let say we want to recalculate a field such as the VAT value. We can not do the same as before, because having some products with VAT A and other with VAT B doesn't make sense.

So, a possible solution would be:

  1. Change the logic of the application so that it shows that the VAT values are being updated and disable the operations that could use it, such as buys.

  2. Write the script to update all the VAT values.

  3. Redeploy with the new code.

  4. Run the script. When it finish:

  5. Redeploy with the full operation code.

So there is not absolute downtime, but just partial shutdown of some specifics part. The user could keep seeing the description of products and using the other parts of the application.

Now let say, that we want to drop a field. The process would be pretty much the same as the first one.

Now, moving fields into embed documents; that's is a good one! The process would be similar to the first one. But instead of checking the existence of the field we need to check if it is a embedded document or a field.

The conclusion is that with a document oriented database you have a lot of flexibility. And so you have elegant options at your hands. Whether you use it or not depends or whether you value more you development time or your client's time.

like image 140
Nerian Avatar answered Nov 03 '22 04:11

Nerian