Reduced availability

Incident Report for Hello Retail

Resolved

At around 1:30, a combination of maintenance tasks caused our secondary database cluster to run slow enough, that virtually all queries ended up in the slow query log. This meant that all servers quickly ran out of disk space, causing them to crash almost simultaneously at 1:34. This meant that our automated recovery procedure could not keep up, and manual recovery was needed. This was completed on 1:48.
In the intermediate time, all newsletter tiles and some recoms (depending on the algorithm) would time out. No data was lost in the incident.
We apologize for the inconvenience.

Posted Feb 15, 2023 - 01:30 CET