Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What to do when Azure Websites and SQL Azure suffers a significant outage, like today?

We use WAWS and WA SQL Azure. This morning the Northern Europe data centre suffered an outage for 1hr 50min. Basically we could not access our websites or databases. Now back, although still rumbling on.

I have to admit that I felt a little helpless.

  • When would it reappear?
  • What has caused it ?
  • Who do I contact?

I have a feeling the cause is network related. May be the Load balancer ?

So what can we do when this happens, as usually MS engineers know about these "events" very quickly and are acting on them.

Some ideas I have had are:

1) Put a polite error page up if domain times out. Not sure how to do this. On an autoping service like pingdom or at the domain service where one defines the CNames. We reroute through to Azure. This communication is key to reassuring customers that issue is being sorted, and to prevent blank Azure 503 pages appearing.

2) Better information from Azure team, Decrease act of faith when service will be resumed.

3) Other actions required when this "event" happens.

I am sure this has impacted other Azure customers, and indeed other cloud customers. I suspect some are fellow Northern Europe users, and were impacted this morning like me. So what measures do you put in place to manage this issue, particularly around customer notice web pages which automatically appear.

EDIT1

Update from MS.

++++++++++++++++++++++++++++++++++++++++++++

SQL Databases - North Europe - Partial Performance Degradation

49 mins ago

Starting at 8/6/2014 6:56 UTC a subset of SQL customers may have experienced difficulty accessing their resources. A significant number of these SQL customers have already seen improvement. We have identified a potential root cause, and are working to restore service. The next update will be provided within two hours.

+++++++++++++++++++++++++++++++++++++++++++++

Partial Performance Degradation = no websites, no databases for us !

like image 963
SamJolly Avatar asked Mar 18 '23 20:03

SamJolly


2 Answers

I'm still suffering from the SQL Azure outage.

Any external resources are not able to connect to the SQL Azure service, however internal resources on our account (e.g. WorkerRoles, WCFRoles, etc) are unaffected.

I don't know what the solution could be; it depends on your solution. I also host several Wordpress self-hosted websites on Azure and some are affected and some are not. The ones that are affected will not load and display a HTTP 502 error.

All I can suggest is a custom HTTP 502 page for your websites hosted on Azure and gracefully trap and handle any communication level exceptions (e.g. .NET's System.Data.SqlClient.SqlException). in your hybrid applications that remotely access your SQL Azure database. shrugs

like image 107
Mike Wilson Avatar answered Apr 27 '23 11:04

Mike Wilson


This is not a good situation, something I always worry about too. There is a solution but it is not particularly cheap, but I guess that's what you pay for uptime.

a) Make sure you use Traffic Manager with a failover website in a totally different region. For example if your main site is Northern Europe then have the other site in West Europe. Chances of both data centers being down is low. You could add more failovers depending on your budget.

b) For you database you need to enable Geo-replication. If you are using Premium then you can make it a readonly online database. The failover website should point to this database. This will mean that your site is readonly for the period of the disruption, but at least your not dead. You can make this failover database your primary one if you want, so it is not readonly anymore. If you only have Standard database, like most of us poor souls, it works similarly but the backup database is 'offline'. Not sure what this means but I think it means you have to wait for MS to decide when things are bad enough to let you connect to the secondary database rather than it being always on.

Some info: http://azure.microsoft.com/blog/2014/07/12/spotlight-on-sql-database-active-geo-replication/

like image 31
Craig Avatar answered Apr 27 '23 11:04

Craig