Password Reset Emails Gone Wild!
Incident Report for 17hats
Postmortem

As a follow-up to yesterday’s incident, here is a more detailed description of what happened.

Last year, Google asked us to undergo an in-depth security assessment and audit, due to our heavy usage of the Gmail and Google Calendar APIs. We hired the well-known security firm Bishop Fox to handle this audit. This included heavy penetration testing (yes, that’s what it is called), meaning they were actively looking for security issues. We are happy to report that we passed this audit.

As part of this audit, Google recommended (but didn’t require) that we look into using HackerOne. This is a community of “white hat” hackers, meaning they will actively try to find security issues. Because we care deeply about security, and are only human, this is a good way to help the community find potential issues. This means that 17hats has been under constant friendly fire since last year. Other than occasionally stressing our servers (which allowed us to optimize certain areas), there have been no issues with this. Until yesterday.

Around 12.35 p.m. PST, we ourselves started receiving password reset notices. Like you, we were puzzled. They were legitimate emails too, but we certainly didn’t make the requests. The application servers were relatively quiet, and it doesn’t make sense for somebody trying to hack into 17hats to send out these notices (you don’t want to tip people off!). Then we did receive a notice from HackerOne that somebody was trying to break the login page, using fake data. That gave us a clue.

Digging deeper, we discovered that this person sent a test request, using empty objects. That should not be an issue, and typically isn’t. Except these empty data objects now caused the database to match all logins. Which still should not have been an issue if it simply had sent an email to the first account. But – and this was the real bug – the code then looped over all results (expecting only one result) and sent out an email to each record.

Now, this caused an overload issue on this process, and that meant the process stopped. Ironically, one of the suggestions from the security audit was to make sending password reset emails a back-end process.* In general, our back-end processes will self-heal and try again. In this case, it would simply resend the emails, until it stopped. Then it retried, etc.

The downside of fast servers is that when something like this happens, it goes wrong fast. By the time we had diagnosed the issue, written a fix, and updated the servers, hundreds of thousands of emails had already gone out. This was about an hour after we received the first email. What made matters worse is that our email host, SendGrid, understandably, queued the emails. Meaning that many hours after this was resolved, people were still receiving the emails, further causing a “what is going on?!” reaction.

We completely understand the frustration of receiving multiple emails, and having the fear that your account is being compromised. We can assure you this was not the case. It is because of our commitment to security that we joined HackerOne, and we look forward to working with its community to ensure your data is safe.

Thank you for your understanding, and your patience yesterday!

*The reason for using a back-end process was so that hackers could not use the timing of a request to determine whether they had entered a real email address that is in our system, versus one that is not. A back-end process means it is handed off to a different server in a queue, so the application server response time is unaffected. Had this remained on our application servers, it would have immediately failed, in all likelihood. It was this “improvement” that caused the above bugs to be exposed yesterday.

Posted Jan 17, 2020 - 14:27 PST

Resolved
Earlier today our password reset email server went a little crazy after an invalid request was made. The result was that it tried to email almost our entire user base with a password reset email. Yikes!

Because this overloaded the server, our self-healing infrastructure then, not so helpfully, decided to restart this process a few times, resulting in people getting duplicate emails, just to add insult to injury. Of course in the meantime we are still trying to find out what is going on. Like many of you, our first thought somebody is trying to break into 17hats. That turned out not to be case, and we were able to fix the bug within 10 minutes. However, by that time the emails had already gone out.

Because our email provider (SendGrid) processes emails in a queue, and because we sent a TON of emails, email delivery on these resets has been delayed in some cases. That means you may still receive an email, even though the issue has already been fixed.

You can safely ignore the emails, and breathe easy. 17hats was not compromised nor was your account. We are so sorry for the inconvenience and confusion this has caused, and any sinking feelings you may have experienced!
Posted Jan 16, 2020 - 16:19 PST
This incident affected: SendGrid SMTP.