Stability issues

Incident Report for Hello Retail

Postmortem

Yesterday from 11.11 we had degraded performance and was down from 11.20 to 11.28.

A fault in our migration system removed an important index by mistake. This index
is related to all our products, and recreating this index overloaded a database cluster. This unfortunately cascaded down to the other services and brought them down as well. We aborted the index rebuild, and brought the system back up without the index. This meant that we were running with slightly degraded response times for the rest of the day.

In the night, during a time of lower traffic, we attempted to recreate the indexes again, to bring the performance back to normal. But unfortunately, this overloaded the cluster again. This means that the issue reoccurred in the night from 00:37 to
00:47.

The indexes have now been recreated, and we have changed the internal processes for how and when we change indexes going forward to prevent a similar issue.

We know the impact a disturbance like this has for your business, and we will do everything in our power to prevent this going forward.

Posted May 06, 2021 - 11:07 CEST

Resolved

A fix has been deployed, and we are running as usual.

Again, sorry for the inconvenience, we are working to prevent this from happening again.

Posted May 05, 2021 - 11:37 CEST

Investigating

We are currently experiencing issues due to an error in the deployment of a release. We will keep you posted with more updates as soon as we know more.

We are really sorry for the inconvenience.

Posted May 05, 2021 - 11:29 CEST

This incident affected: Admin dashboard, Product recommendations, Newsletter content, Search, and Triggered email.