Site slowdown / Intermittent Unavailability
Incident Report for 17hats
Resolved
During scheduled maintenance around 4am Pacific this morning, we were doing a simple database update to prepare for the increase in traffic for the remainder of the year. There was an unintended side effect of this update that created a database load of 100%, causing the site to become quite slow and often not letting members log in.

While we immediately rectified the issue, it took the database a little over 90 minutes to catch up, during which the site was available on and off.

We did not expect downtime, and had not seen any during our testing. However, we do these updates in the middle of the night on a weekend to minimize its impact if there are issues. When we do expect downtime, we announce this beforehand. In this case, none was expected but despite our best efforts, in this case downtime did occur.

What went wrong additionally is that the support team didn’t get notified that something needed to be addressed and communicated. We are reviewing the procedures to make sure that doesn’t happen again!

We apologize for the disruption that this may have caused you. We are glad to report that all systems are back working fully, and have so for over the last hour.

Donovan
Posted Sep 07, 2019 - 07:28 PDT
This incident affected: Web App.