I am writing a NoSQL database engine and I want to provide features to help the developers to upgrade their application to a new version without stopping the operation of the website, i.e 0% downtime during upgrade. So my question is, what are the methods or general design of a web application when it is run 24/7 and is changing its database structure very often? Any examples or success stories would be greatly appreciated.
With NoSQL - and specifically a document oriented database - you can accomplish this with versioning.
Consider MongoDB, which stores everything as documents.
MongoDB allows you to have a collection (a group of documents) where the schema for every document can be different.
Let's say you have this document for a user:
{
"_id" : 100,
"firstName" : "John",
"lastName" : "Smith"
}
You could also have this as a document in the same collection:
{
"_id" : 123,
"firstName" : "John",
"lastName" : "Smith",
"hasFoo" : false
}
Different schemas, but both in the same collection. Obviously this is very different from a traditional relational database.
The solution then is to add a field to every document that has the schema version. Then you have your application look for that version with every query.
A MongoDB query might look like this:
users.find({ "version" : 3 }).limit(10);
That just returns all users that are using schema version "3". You can insert newer schemas without affecting the existing site and slowly delete old schema versions that aren't useful anymore.
You're going to be building a distributed system. There's no way around this, as you'll need multiple machines involved to deal with things like reboots.
Building a distributed system means you're making some choices. Pick 2 of:
Systems like S3, have chosen 1&2 and paid the price by sacrificing #3 in favor of "Eventual Consistancy". There's a great paper on S3 you can read. Other database solutions, like DynamoDB have chosen different trade offs.
You're going to need load balancers. Otherwise you're stuck with customers connecting directly to your service, which is rough for a variety of reasons. A Load Balancer lets you reboot a machine in your fleet without incurring down time. Reboots, as we all know, are a fact of life.
Doing what you're describing is very tough. In fact, I would say it's nearly an impossible problem for a single developer to tackle.
You are far, far, far more likely to get better results using an existing NoSQL database and spending all your time working on your product....
If an enterprise can invest in geographical distribution of database. Like failover tolerance; it sounds traditional but data replication (or datastore replication) wouldn't be an issue for routing traffic.
Option 2:- use of caching (custom develop) & cycling. ex:- 1 am to 2am use snapshot 1 of database (let's say server1 /data center 1) 1:59am server2/data center 2 consists of new database architecture (new fields, new tables etc.,) and @ 2am all traffic route through data center 2.
Cycling basing the snapshot may be a solution to consider.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With