Jump to content

Emergency Network Maintenance


Recommended Posts

This morning at aprox. 2:40AM EST, our edge router suffered a failure in one of its gigabit line cards which serves traffic to our rack mounted distribution switches. The issue was quickly identified and the line card power cycled in an attempt to recover it. Though this did work and we restored service to the line card with roughly 10 minutes of downtime, it quickly became apparent there was a series of errors coming off the card indicating that another failure was possible if not highly likely. After some preparation, at 3:30AM EST we removed the line card exhibiting issues and replaced it with a spare card, this process took 5 minutes and service was quickly restored with the new line card appearing to operate as intended and no further errors being reported on the device.

 

I should stress that we have on hand multiple spare parts for our networking equipment including a spare router in an identical configuration as the primary unit, should there have been the need, we could have and would have immediately switched service over to the standby router. This is the first hardware failure on our router in the 3 years since it has went into production, which we are certainly more than pleased with but will continue to strive for that same or better excellence going forward in our network infrastructure.

 

We apologize for any inconvenience these outages may have caused and thank you for your understanding.

Link to comment
×
×
  • Create New...