Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Updating the production IIS WebSite and SQL Server Database without stopping

There is a testing server that uses the testing database. We test the website on the testing server. If it is okay, we update the website and the database schema from the testing server to the production server. But this method is very painful and risky.

First, we have to redirect the users to a maintenance page, so the website is paused for a while.

Second, if something goes wrong when updating, we have to back to old website, because we can't put the website in a maintenance mode for a long time.

So I'm seeking a solid solution to update an IIS website and an Sql Server Database without data loss and using maintenance mode. Is there any way to do this? How the big websites do this without data loss and pausing.

We've thought using a release candidate website. We've planned to use this RC website for temporary. First, we update the RC site, then swap the bindings between RC and production website. But this time the database is problem. Because we can change the database schema, and the old one can't work with new database. So, if we use a temp site with temp db, there will be data loss. Also, the updated website won't work with old database if the temp site uses old production database. So I need a solid and practical solution for this problem.

like image 683
oruchreis Avatar asked Jul 26 '12 12:07

oruchreis


1 Answers

This is orders of magnitude more complicated than you imagine. This specifically is not about HA nor about contiguous integration. Neither of those will provide what you need, they're only pieces of the much more complex puzzle.

There simply isn't possible to write code changes in a manner that is transparent/oblivious to schema changes as they occur. At best you can write the code in a manner that supports the schema at v. N and at v. N+1, which in itself is a big challenge. But is impossible to write the code in a manner that supports the schema as it transitions from v. N to v. N+1. The schema change induced by a deployment has to be atomic for the code operating on the schema. Since the schema change itself cannot be atomic, it follow that the upgrade has two possible avenues:

  • take the code offline during the schema change. This is what you're doing now and is the safest approach. Of course, it implies service availability down time and runs the risks you already experienced (rollback of failed upgrade, lengthy upgrade, etc). A variant of this approach is to redirect the service to a read-only copy of the data and offer a degraded service experience(no changes are possible during the downtime) which may or may not be acceptable, depending on the business specifics.

  • standby upgrade. This implies that you take a snapshot of the service data (various HA solutions may provide a standby snapshot out-of-the-box, eg. log shipping). Upgrade the snapshot, then apply all the transactions that occurred on the real service data to the upgraded snapshot. This is always tricky, because it requires a technology to detect, capture and apply the changes (eg. change tracking, replication, custom solution etc) and requires to transform each change to the new, upgraded, schema. Once the upgraded schema is up to date with changes from the main service, the service can be redirected to the upgraded schema. This redirection is also much more complex than it sounds. For one choosing the moment when to cut-off the old schema and stop accepting new changes, while making sure all changes were applied to the new upgraded schema DB is a challenge in itself. Another challenge is to resolve the conflict of the code understanding pre-upgrade and post-upgrade schema versions. Developing code that handles both is, as I said, problematic and error prone, so one solution is to, again, take the service offline for a short period and replace the code. Another solution is to have a standby service, running code that handles the post-upgrade DB schema and is connected to the post-upgrade DB, and redirect the live requests to your standby, upgraded, service.

And I did not even touch the thorny subject of service interaction, when a particular service of a much larger deployed solution has to be upgraded. This is when service API protocol back compatibility plays the major role to allow the post-upgrade service to play along with its peer services.

Ultimately there just isn't any silver bullet. I've witnessed single machine large DB deployments that took weeks to roll out version N+1, with transcriptional replication contiguously feeding the post-upgrade DB schema with changes from the pre-upgrade DB. And I witnessed deployments of thousands of machines deploying version N+1 in stages, as a complicated dance of enabling code and data changes over the course of several days to reach the full functionality of the post-upgrade. This problem is just plain hard.

like image 84
Remus Rusanu Avatar answered Oct 02 '22 01:10

Remus Rusanu