Flexpa - Notice history

api.flexpa.com - Operational

100% - uptime
Mar 2026 · 100.0%Apr · 100.0%May · 100.0%
Mar 2026
Apr 2026
May 2026

link.flexpa.com - Operational

100% - uptime
Mar 2026 · 100.0%Apr · 100.0%May · 100.0%
Mar 2026
Apr 2026
May 2026

my.flexpa.com - Operational

100% - uptime
Mar 2026 · 100.0%Apr · 100.0%May · 100.0%
Mar 2026
Apr 2026
May 2026

flexpa.com/docs - Operational

100% - uptime
Mar 2026 · 100.0%Apr · 100.0%May · 100.0%
Mar 2026
Apr 2026
May 2026

portal.flexpa.com - Operational

100% - uptime
Mar 2026 · 100.0%Apr · 100.0%May · 100.0%
Mar 2026
Apr 2026
May 2026

Notice history

May 2026

Degraded Performance
  • Postmortem
    Postmortem

    Summary


    On May 7 between approximately 7:51 PM and 8:08 PM ET, Flexpa customers experienced elevated response times and request timeouts on hot-path API endpoints, including the Flexpa OAuth Authorization URL. The root cause was an AWS infrastructure failure in us-east-1, which AWS has acknowledged in their AWS Health Dashboard (AWS event).

    Timeline (ET)

    • 7:20 PM — AWS data center experienced a thermal event in us-east-1 Availability Zone use1-az4 causing instance impairments due to loss of power.

    • 7:36 PM — Internal monitoring detected increased Redis command latency on background job workers.

    • 7:51 PM — Customer-facing API requests began hitting timeouts as the impaired Redis primary became progressively unresponsive. Status page updated to "Monitoring."

    • 8:06 PM — Amazon ElastiCache's automatic Multi-AZ failover completed, promoting a replica in an unaffected Availability Zone to primary.

    • 8:08 PM — Full service recovery confirmed; latency and error rates returned to baseline.

    • 8:25 PM — AWS publicly acknowledged the AZ-level impairment.

    What happened
    Flexpa runs Redis as a Multi-AZ ElastiCache replication group, with a primary node and replicas distributed across multiple Availability Zones for exactly this kind of failure. When AWS lost power in use1-az4, our Redis primary was hosted in that AZ and progressively degraded over ~15 minutes before AWS's automatic failover triggered. During that window, requests that depended on Redis (rate limiting, session state, queue operations) stalled and eventually timed out at the request layer. Once the failover completed and traffic shifted to a replica in a healthy AZ, the service recovered immediately.

    Customer impact

    • Window of degraded service: approximately 17 minutes (7:51 PM – 8:08 PM ET).

    • Affected: hot-path API endpoints, OAuth authorization redirect, and queued background jobs.

    • No data was lost. Failed requests are safe to retry.

    What we're doing

    • We are reviewing our Redis client retry behavior to shorten the recovery window in future Multi-AZ failovers from ~90 seconds to under 30 seconds.

    • We are reviewing direct alerting under this scenario so that paging occurs sooner

  • Resolved
    Resolved
    This incident has been resolved.
  • Monitoring
    Monitoring

    We observed degraded response times and a Redis failover at 00:06 UTC against several hot-path API endpoints including the Flexpa OAuth Authorization URL.

    The service is currently stable and we are investigating.

Apr 2026

No notices reported this month

Mar 2026

No notices reported this month

Mar 2026 to May 2026

Next