This issue was caused by a failure in our DC Power Plant. This is the source that feeds all our Core Routing gear. The problem was identified as a faulty Capacitor and shorted fail over switch. This should of never happened and we still do no fully understand what series of events caused the DC generating plant to fail. The system is designed with dual redundant plants and a 3rd failover is UPS Battery Backup.
It appears when the Capacitor blew it sent a spike across to the fail over switch and nothing failed over. What we really do not yet understand is why the UPS Battery system did take over the load. During the entire out the UPS bank never showed more than a 13% load and was sitting in ready mode but never deployed.
The reason our entire network did not go down all at once was because certain routing gear requires 20V and some other gear need 30V or 40V or 40-48V. When the generation of DC power stooped the internal batteries stopped getting continuous charge. So we watched in horror as the battery voltage started dropping from 48V. When we were able to restore DC operations, our internal DC power plant was producing 18Vs.
I am thankful we were able to restore services so quickly. This was a very large issue, that my team handled with passion and safety. When we are dealing with 408V of power, it gets a little scary. But the electrical team did an great job.
We do not expect this to occur again, however we have contracted with outside independent DC power plant specialty company to review.
Any upcoming maintenance on this unit will be advised on our forums.
Thank you for your understanding and I am really sorry this has happened.
In regards to uptime i would like to offer this, I have been using this reply to clients about this outage and felt it appropriate to include here.
" We contract with an outside, 3rd party and independent monitoring company to track our uptime.
You can review their full report for our network uptime at the following link :
If you take a close look you can see we have never ended our yearly uptime with anything less than 99.99%. You can also note that we have never fallen below 99.95% uptime. Most likely 2019 will be another 99.99+% uptime track record for us. Please note we purge uptime records every 3 years.
Downtime is an unfortunate part of this business and we strive to do our best. However, every host on every platform will have downtime. Facebook, Amazon and all the other large companies. Even, yes Google and Yahoo have outages.
I am truly sorry this outage occurred and please know that I was on site at the Data Center within 30 minutes of the outage start and stayed onsite and guided all operations to restore service as quickly as possible. "