Jump to content

Emergency Issue At Data Center


Recommended Posts

Greetings,

 

At approximately 10:40 AM EST we received a fire alarm at the TCH NOC. At that time all staff were evacuated per emergency procedures and there was some power loss. There does not appear to be an actual fire, however the the fire department is on scene to investigate. Power is now slowly being restored and services are coming back on line.

 

We appreciate your patience during this matter and we will keep you updated here.

Link to comment
Share on other sites

We are back in the building and there was no fire. A breaker over heated and tripped causing the fire alarm sensors to also trip.

 

We lost power to all our servers and are now going thru each server at a time to make sure we have everything back online,

 

Please hold off on submitting tickets, we are fully aware of this issue and myself and Ryan are on site working as quickly as possible to restore service.

 

We just got the last of the shared/reseller servers back on line and are now working on dedicated servers that are not pinging.

 

Be patient, I am sorry for this.

 

Bill

Link to comment
Share on other sites

Welcome to the forum, DateSafeProject. :shocking:

 

 

When an e-mail message is sent from a remote server and it can not deliver it to the destination due to the host being unavailable then the server queues that message for up to 2 days and will reattempt delivery every 60-240 minutes depending on that servers configuration.

Link to comment
Share on other sites

All servers are back on line and everything is a-ok.

 

We may have lost a supervisor card in our Cisco 6509 - we are checking that now. Nothing to worry about, we run redundant 6509 chassis's with 4 fail over supervisor cards.

 

Updates to follow...

Link to comment
Share on other sites

Final Update -

 

At around 10:40 AM EST a breaker in our main PDU tripped causing power loss to nearly 60% of the data center. This power loss caused our fire alarm to trip due to low voltage over the circuit. This of course caused a chain reaction. The Data Center was evacuated and we started to investigate the issue. Within several minutes it was determined that no fire was in progress and we started to repopulate the data center with our staff.

 

At approximately 11:25AM EST full power was restored to the data center and servers started to light back up on line without the need of any physical intervention.

 

There were many servers that required physical intervention to get them back online. This is what took the majority of the time.

 

There were several servers that required File System Checks and several that simply just needed another hard reboot.

 

There was no physical damage to any of the servers or to the Data Center itself.

 

We have electricians on site determining the exact cause of why the breakers tripped and I will be posting a further update when I have the report from them. At worst case, we may need to replace a breaker or two and this could lead to a very short scheduled maintenance window. Of course, I will update everyone once this is known to be needed or not.

 

Thank you for your understanding during this emergency situation and we hope that you understand that we worked as quickly as possible to restore service. I hope we can have another 12 month run of 100% uptime as we just did.

 

Thanks again.

Link to comment
Share on other sites

Well that explains it. I was reading the forums at 10:30 and got an email asking to post a file to the website. ftp failed to connect and as I tried to find out what was wrong I couldn't reach any of TCH, forums not found, my sites were gone and the front page was missing. I ran a couple of trace routes and got really strange and different results.

 

I wasn't worried, I know you guys are always on top of things. So I went to the store, shopped a little and by the time I got home everything was back up.

 

Good job guys, lets shoot for 13 months next time ;)

Link to comment
Share on other sites

×
×
  • Create New...