Yesterday from 11.11 we had degraded performance and was down from 11.20 to 11.28.
A fault in our migration system removed an important index by mistake. This index
is related to all our products, and recreating this index overloaded a database cluster. This unfortunately cascaded down to the other services and brought them down as well. We aborted the index rebuild, and brought the system back up without the index. This meant that we were running with slightly degraded response times for the rest of the day.
In the night, during a time of lower traffic, we attempted to recreate the indexes again, to bring the performance back to normal. But unfortunately, this overloaded the cluster again. This means that the issue reoccurred in the night from 00:37 to
00:47.
The indexes have now been recreated, and we have changed the internal processes for how and when we change indexes going forward to prevent a similar issue.
We know the impact a disturbance like this has for your business, and we will do everything in our power to prevent this going forward.