Notice history

May 2026

Postmortem
May 08, 2026 at 11:24 AM
Postmortem
May 08, 2026 at 11:24 AM
Summary

On May 7 between approximately 7:51 PM and 8:08 PM ET, Flexpa customers experienced elevated response times and request timeouts on hot-path API endpoints, including the Flexpa OAuth Authorization URL. The root cause was an AWS infrastructure failure in us-east-1, which AWS has acknowledged in their AWS Health Dashboard (AWS event).

Timeline (ET)
- 7:20 PM — AWS data center experienced a thermal event in us-east-1 Availability Zone use1-az4 causing instance impairments due to loss of power.
- 7:36 PM — Internal monitoring detected increased Redis command latency on background job workers.
- 7:51 PM — Customer-facing API requests began hitting timeouts as the impaired Redis primary became progressively unresponsive. Status page updated to "Monitoring."
- 8:06 PM — Amazon ElastiCache's automatic Multi-AZ failover completed, promoting a replica in an unaffected Availability Zone to primary.
- 8:08 PM — Full service recovery confirmed; latency and error rates returned to baseline.
- 8:25 PM — AWS publicly acknowledged the AZ-level impairment.
What happened
Flexpa runs Redis as a Multi-AZ ElastiCache replication group, with a primary node and replicas distributed across multiple Availability Zones for exactly this kind of failure. When AWS lost power in use1-az4, our Redis primary was hosted in that AZ and progressively degraded over ~15 minutes before AWS's automatic failover triggered. During that window, requests that depended on Redis (rate limiting, session state, queue operations) stalled and eventually timed out at the request layer. Once the failover completed and traffic shifted to a replica in a healthy AZ, the service recovered immediately.
Customer impact
- Window of degraded service: approximately 17 minutes (7:51 PM – 8:08 PM ET).
- Affected: hot-path API endpoints, OAuth authorization redirect, and queued background jobs.
- No data was lost. Failed requests are safe to retry.
What we're doing
- We are reviewing our Redis client retry behavior to shorten the recovery window in future Multi-AZ failovers from ~90 seconds to under 30 seconds.
- We are reviewing direct alerting under this scenario so that paging occurs sooner
Resolved
May 08, 2026 at 12:02 AM
Resolved
May 08, 2026 at 12:02 AM
This incident has been resolved.
Monitoring
May 07, 2026 at 11:51 PM
Monitoring
May 07, 2026 at 11:51 PM
We observed degraded response times and a Redis failover at 00:06 UTC against several hot-path API endpoints including the Flexpa OAuth Authorization URL.

The service is currently stable and we are investigating.

Apr 2026

No notices reported this month

Mar 2026

No notices reported this month

Flexpa - Notice history

May 2026

Apr 2026

Mar 2026