AWS issues in US-EAST-1 region (US-USE1-AZ4) Region

Incident Report for Instaclustr

Resolved

As of 4.22pm PST, AWS has resolved the outage on their side. They did mention that instances that have not recovered so far will need to be manually recovered. In the past few hours, Instaclustr has recovered over 90% of the impacted nodes and only a few remain. We'd be resolving this incident and communicating on the respective customer tickets.

Posted Dec 23, 2021 - 01:06 UTC

Update

From the most recent AWS update, it appears some nodes won't recover by itself. Instaclustr has started recovery procedures on the nodes that have not recovered so far and those customers are notified through a support ticket.

Posted Dec 22, 2021 - 23:48 UTC

Update

AWS are still working on recovering the instances that are affected by this outage in the impacted region. Some of the instances have recovered as part of the AWS recovery process. Instaclustr Support is currently reviewing the appropriate course of action for instances that have not yet recovered.

Posted Dec 22, 2021 - 21:59 UTC

Update

As per the latest update from AWS (9:28 AM PST ), impacted nodes are still recovering from this incident. The majority of the affected EC2 instances and EBS volumes are recovered, but are now working through full recovery at the host level.

Posted Dec 22, 2021 - 17:59 UTC

Identified

At approx 4.35 PST today, AWS has identified a power outage that caused issues launching EC2 instances in the US-EAST-1 region (US-USE1-AZ4). This issue will impact existing clusters and also new node provisioning in that region (instance store & EBS backed). Other availability zones in US-EAST-1 remain unaffected. AWS has been working through this issue for the past few hours and power was restored at approx 06:51AM PST. The latest AWS update (as of 08:02AM PST) says that the instances will continue recovering. Instaclustr Support has identified the list of nodes that are impacted by this incident and will take remediation measures as necessary to recover the nodes. This incident should have no impact to your application if you have proper fault tolerant mechanisms in place (e.g RF3, querying at quorum etc.).

We will provide further updates shortly.

Posted Dec 22, 2021 - 17:24 UTC

This incident affected: AWS EC2 Regions (AWS ec2-us-east-1).