What’s worse, the outage or poor communication? Office365 Exchange was offline today for 7+ hours, cutting off oxygen for millions of email users. The 100+ posts at the admin portal suggest that the outage wasn’t as disruptive as having limited visibility into the remedy.
The good news is that this is the first service disruption in more than two years at a cost that’s an order of magnitude less than rolling our own. I recall 5 years ago the rack of Exchange servers, Blackberry Enterprise Server and 24 hour battery backup in our data center. Rebooting the server at 4 am, on weekends. No thanks.
The bad news is that business was disrupted unexpectedly today and the fix was out of my hands. How hard would it be for enterprise cloud infrastructure and apps vendors to increase the incident communications budget? While enterprise customers don’t want to be on pager duty, I’d rather a text from MSFT that service is disrupted with updates every 10 minutes and what’s being done to fix it.
It may seem like a tall order to broadcast detailed updates every 10 minutes, but when the service is down, 100% of your customers are carefully reading every word you say.