Data Center Incident
Incident Report for VPSBlocks Pty Ltd
Postmortem

VPSBlocks Incident Report

At 8:36PM on Friday 3rd April the data center (OMNIConnect DC) in which our equipment is located suffered a major power outage. Their incident report can be downloaded at https://www.vpsblocks.com.au/files/IncidentReport-3-4-2020INC14161.pdf

VPSBlocks staff members were onsite at OMNIConnect by 8:45PM, and by 8:55PM we had staff in our office working to bring services back online.

By 9:07PM some services were back online, and this continued throughout the evening. By 10PM more than 70% of services were back online. At that time there were issues with 2 SANs. One we managed to resolve by 2AM. The last SAN had a critical unrecoverable failure. The power outage had damaged controllers in multiple drives meaning they were not detectable by any system. We initially contacted 4 independent data recovery specialist companies, and was advised that any recovery is not possible. This unfortunately was confirmed today.

As a result we started restorations from our emergency SAN volume backups as well as managed backup repositories. We tried to balance these restorations to get clients up and running as quickly as possible, however given the many terabytes of data requiring restoration it took most of the weekend. By Monday morning all services were back online.

We deeply apologise for any loss that occurred to any client as a result of this outage. We will be discussing the data center incident report in detail with them. Please note that VPSBlocks has no interest financial or otherwise in OMNIConnect.

From a VPSBlocks point of view we feel this is a unique situation where having to restore data from emergency backups is the first time we have had to reach to such backups in 8 years of operation in anything other than routine testing. Our staff did a fantastic job of answering nearly every call and every ticket was responded to as quickly as possible. We tried to be as communicative as possible throughout the outage and subsequent restoration period.

This of course has brought up some issues that we will investigate solutions to in the near future. Primarily the backup restoration time was longer than we would have liked and there was an issue with one of the volume backups affecting a very small number of clients.

VPSBlocks prides itself on offering great service, we rarely see any ticket or request go unanswered for more than 10 minutes, and have always been willing to help clients, we believe, well beyond what most hosting companies will do. We appreciate each and every customer, and we hope you will continue to support us well into the future, as we look forward to serving you.

If you have any questions relating to the outage please email support@vpsblocks.com.au and we will reply as quickly as possible.

Thank you,
Will Kruss
Technical Director
VPSBlocks Pty Ltd

Posted Apr 08, 2020 - 13:25 AEST

Resolved
This incident has been resolved.
Posted Apr 06, 2020 - 00:10 AEST
Update
Work is continuing to restore all services. We believe all services will be restored by this evening. Most services are online and we apologise if you are in the last group of restorations.
Posted Apr 05, 2020 - 09:53 AEST
Update
Just an update, we are continuing to bring services online. Restoration is very slow with a large number of TB to be restored. Please be assured we are working as fast as possible to bring all services that are still affected back online.
Posted Apr 04, 2020 - 17:51 AEDT
Update
Most services are back online, however, there is 1 SAN that has had a critical failure. As a result we are currently restoring from backup. This will mean some loss of data for affected clients. We do apologise for this and are working very hard to get the backups restored and all services back online.
Posted Apr 04, 2020 - 03:33 AEDT
Update
We are continuing to work on a fix for this issue.
Posted Apr 03, 2020 - 23:48 AEDT
Identified
We are still bringing services online. There are some issues with storage as a result of the data center failure. We will be working through until this is resolved.
Posted Apr 03, 2020 - 23:33 AEDT
Update
About 60% of services are back online, we are still working on getting the rest of the services back and will update as we make progress.
Posted Apr 03, 2020 - 21:48 AEDT
Update
There has been an incident at the data center. We are working to bring services online in conjunction with data center staff. It may take a few hours to bring all services back online.
Posted Apr 03, 2020 - 21:02 AEDT
Investigating
We are currently investigating this issue.
Posted Apr 03, 2020 - 20:39 AEDT
This incident affected: VPSBlocks Website, Management Portal, High Availability VPS Services, Regular VPS Services, and Network.