I have a number of VMs on Windows Azure (Iaas) hosting a website. There are a number of load-balanced front-end VMs, all connecting to a single VM with SQL Express. It works well.
However!
I'm getting random restarts across all the VMs. As for the front-end VMs (with IIS), since they are load balanced, the site is not affected and the load balancer adjusts accordingly. But when the VM hosting the database is restarted, the site is down until the DB is up again. It takes < 3min to boot up, but that's still unacceptable if it happens frequently enough. Although the restarts are relatively rare (2 a month per VM), sometimes we get a week with 4 restarts per VM, which gets frustratingly annoying. Not all VMs restart as frequently and I cannot figure out a pattern. Restarts are also unexpected (pull-the-power-cable type of restarts, and not shutdowns). Datacenter is West Europe.
Microsoft emphasises that SLA only covers 2VMs in an availability set, which I can't have for the database VM (and the enterprise SQL edition costs an arm and three legs). Also, SQL Azure isn't an option as the application is very chatty, and the SQL Azure database was being throttled during peak times (though it works super smooth with SQL Express on a Medium VM!).
My question(s): Is it normal to have so many restarts? Are there other people having the same problem? What is your experience with such an environment on Azure? What can I do to minimise this downtime?
Thanks all!
If there is a disruption to the availability or connectivity between the VM and storage for more than 120 seconds, VMs will shutdown to avoid data corruption. VMs will automatically power back after a connection has been restored which can be 5 minutes or significantly longer.
Whenever the availability or connectivity between the VM and the associated virtual disks is affected for more than 120 seconds, the Azure platform performs a forced shutdown of the VMs to avoid data corruption. The VMs are automatically powered back on after storage connectivity has been restored.
A VM may be automatically shut down or suspended if: The environment was not in use. The environment-wide auto-shutdown options shut down or suspend all of the VMs in an environment after a period of inactivity.
Is it normal to have so many restarts?
Yes this can happen in a given month, you need to stand up SQL Server in high availability mode to really get this to work.
Yes it does cost an arm and leg. ;(
What is your experience with such an environment on Azure? Some months are really good some months are bad, depends on your cluster and which datacenter you are in. MS have mixed range our hardware out in there datacenters. That does not mean they are running on old laptops in some datacenters but it does mean in my experience the new datacenters tend to have better kit in them and thus less restarts. I.e we use USA East.
What can I do to minimise this downtime?
High availability with a witness is the only way to give you availability in VM and yes it cost and arm and leg.
Other serious options. Cache Cache ..You should use computer cache, azure cache and try to minmize your calls to the database. This might reduce your chatty app and allow you to step back in SQL Azure, but might give you enough to for the failover to recover back.
Queues Queues would help you application recover and give you user a message of we are working on it.
Use SQL Azure as failover. Data sync using SQL Azure Sync from Premise (Not sure this works with Express) to SQL Azure and write into you app code to pick up the connection error and failover.
Look at using other parts of Azure for parts of your app to reduce your amount of calls coming into SQL , i.e Can you move stuff to table storage ?
HTHS give you some ideas.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With