Jan 22, 13:51:01 GMT+0
Investigating -
We are currently investigating this incident..
Jan 22, 14:15:00 GMT+0
Monitoring -
We implemented a fix and are currently monitoring the result..
Jan 22, 17:39:52 GMT+0
Resolved -
This incident has been resolved..
Jan 10, 13:44:27 GMT+0
Investigating -
We are currently investigating reports of stuck authorizations on Flexpa Link..
Jan 10, 13:57:58 GMT+0
Monitoring -
We implemented a fix and are currently monitoring the result..
Jan 10, 14:10:33 GMT+0
Resolved -
This incident has been resolved. The incident impacted successful Flexpa Link authorizations.
Why did it happen?
We pushed a bug into production code related to the generation of system Patient IDs upon a successful Flexpa Link authorization. Due to a test which passed incorrectly, the bad logic was not caught in development.
How to prevent in future?
By creating re-usable test fixtures, specifically as it relates to our patient authorization model, we can make it easier for developers to avoid writing false positive test cases..
Dec 6, 19:00:03 GMT+0
Investigating -
We are currently investigating this incident..
Dec 6, 19:23:41 GMT+0
Resolved -
This incident has been resolved.
This did not affect any customer implementations or integrations with Flexpa Link.
Why did it happen?
We promoted an environment change in MyFlexpa's dev mode, and had to complete unexpected manual steps to redeploy the MyFlexpa application.
How to prevent in future?
Improve our internal change process for updating MyFlexpa's environment configuration by removing unnecessary manual steps..
Nov 13, 18:54:38 GMT+0
Investigating -
We are currently investigating this incident..
Nov 13, 18:59:38 GMT+0
Identified -
We are continuing to work on a fix for this incident..
Nov 13, 19:04:43 GMT+0
Monitoring -
We implemented a fix and are currently monitoring the result..
Nov 13, 19:20:41 GMT+0
Resolved -
This incident has been resolved.
Why did it happen?
Flexpa released a fix to accommodate a scenario where some payer authentication flows were incompatible with Safari. However the fix, which uses cookies to handle communication between Flexpa Link and Flexpa API, introduced a bug which was undetectable in the development environment because the development environment shares the same domain across services.
How do we prevent it in the future?
- Broaden test suite across browsers. Safari, Chrome, Firefox etc.
- Test across multiple domains in the development environment.
Oct 26, 18:25:46 GMT+0
Investigating -
We are currently investigating this incident..
Oct 26, 18:40:57 GMT+0
Resolved -
This incident has been resolved.
Why did it happen?
The new parameter in Flexpa Link was checked if it was defined, and so the word "undefined" was being serialized into the Flexpa Link application state, and the application was crashing on this unexpected value.
How do we prevent it in the future?
- Include Flexpa Link (package) in our CICD test suites.
- Implement a UX Monitoring alarm on MyFlexpa to test that Link is loading successful.
Oct 3, 19:14:27 GMT+0
Investigating -
We are currently experiencing an issue processing patients. Patients will see an error during a patient link: "Patient Init"..
Oct 3, 19:58:24 GMT+0
Resolved -
We have resolved the issue with our API.
The database cache our apps connect to was destroyed in the course of a configuration change today.
This prevented a service from being online.
That service being offline meant that every new Patient Access Token transitioned to an error state. An error state was shown visibly to patients / our customers and Links do not occur successfully.
This impacted all customers (excepting those opted out from our data cache) because every new Patient Access Token requires this step to successfully complete.
We completed resolution of this event once we had all applications pointed at the new data cache.
Flexpa takes downtime such as this event seriously. We have completed a post-mortem procedure and will be introducing changes to our process to prevent this issue from re-occurring..
Aug 18, 14:35:14 GMT+0
Investigating -
We are currently investigating this incident..
Aug 18, 14:38:46 GMT+0
Resolved -
This incident has been resolved. Post mortem summary:
**What happened?**
api.flexpa.com was unavailable and 503s were returning from the load balancer.
**Root cause analysis**
We tried to deploy a Task Definition that was pointing to secrets that were not in the parameter store, because those secrets were removed after the deploy was initiated.
**Action items to prevent its reoccurrence**
Ensure that removing a secret from Doppler should regenerate a task definition and redeploy..
Jul 27, 22:22:12 GMT+0
Investigating -
We are currently investing an issue with our API..
Jul 27, 22:23:00 GMT+0
Resolved -
This incident has been resolved. We believe this was an incorrect alarm..
Jul 21, 20:58:05 GMT+0
Monitoring -
We have identified the source of the issue. The API is operational and available during this time. We are working on a definitive resolution..
Jul 21, 21:43:55 GMT+0
Resolved -
This incident has been resolved.
We discovered an uncaught error in responses to our /fhir endpoint and implemented a fix..
Jul 21, 20:39:42 GMT+0
Investigating -
We are currently investing an issue with our API..
Jul 21, 20:40:42 GMT+0
Resolved -
We have resolved the issue with our API..
Jun 14, 08:14:42 GMT+0
Resolved -
We have resolved the issue with our API..
Jun 14, 08:14:12 GMT+0
Investigating -
We are currently investing an issue with our API..
Jun 6, 23:25:42 GMT+0
Investigating -
We are currently investing an issue with our API..
Jun 6, 23:28:42 GMT+0
Resolved -
We have resolved the issue with our API..
Jun 12, 03:46:08 GMT+0
Resolved -
### Post Incident Summary
Availability of Flexpa is critically important to our customers. As part of our normal incident response, we have conducted a post-incident summary as to the source of this incident.
On June, we deployed new logging mechanisms in order to more effectively troubleshoot customer issues. The deployment was completed successfully and internal testing showed systems were operationally normal.
On June 6, Flexpa experienced three brief API outages cumulatively lasting 6 minutes. During these outages, Flexpa was completely unavailable as services restarted
At 6:31 PM EST, it was observed that the MyFlexpa application was behaving erratically.
At 7:22 PM EST, a crash of Flexpa applications was observed and noted internally.
At 7:23 PM EST, Flexpa's applications had automatically restarted and recovered.
At 8:05 PM EST, a root cause had been identified and a fix was deployed to update logging to prevent future crashes. We have been operationally normal since this time.
At 8:26 PM EST, an update was applied to ensure future alerting and logging was routed to the appropriate location.
Moving forward from this incident we are:
* Improving testing environment for logging updates to detect issues in advance of production deployment
* Continue investigation into logging libraries Flexpa depends on in order to understand failure more deeply and prevent similar issues in the future.
* Re-affirming our commitment to operational excellence.
Jun 6, 23:22:12 GMT+0
Investigating -
We are currently investing an issue with our API..
Jun 6, 23:23:12 GMT+0
Resolved -
We have resolved the issue with our API..
Jun 6, 22:44:12 GMT+0
Investigating -
We are currently investing an issue with our API..
Jun 6, 22:44:42 GMT+0
Resolved -
We have resolved the issue with our API..
Jun 3, 16:32:12 GMT+0
Investigating -
We are currently investing an issue with our API..
Jun 3, 16:32:42 GMT+0
Resolved -
We have resolved the issue with our API..
Apr 20, 21:30:00 GMT+0
Investigating -
We are currently investigating this incident..
Apr 20, 21:39:00 GMT+0
Resolved -
This incident has been resolved..
Apr 26, 19:50:38 GMT+0
Resolved -
### Post Incident Summary
Availability of Flexpa is critically important to our customers. As part of our normal incident response, we have conducted a post-incident summary as to the source of this incident.
On April 19, we deployed a new database proxy / bouncer (PG Bouncer) as a new piece of infrastructure to assist in a migration effort. The deployment was completed successfully and internal testing showed systems were operationally normal.
On April 20, Flexpa experienced two brief API outages cumulatively lasting 17 minutes. We continued to experience high latency throughout the day.
At 12:17 PM EST it was observed that an internal application was not available.
At 12:22 PM EST our active database client connections began to drop dramatically from normal levels. At the same time, an internal responder was not able to successfully connect to the new database proxy infrastructure to investigate.
At 12:43 PM EST an availability monitor on api.flexpa.com was triggered and automatically posted a public status alarm. The internal responder escalated an an incident response call began.
At 12:51 PM EST the responding team restarted the database proxy and client connections successfully resumed.
While external availability to api.flexpa.com was restored at this time, internal applications continued to be unavailable due to database connectivity. Investigation continued with a review of the proxy logs. The investigating team was able to determine that the application code used to connect to the database proxy had been misconfigured.
At 5:30 PM EST the incident response team attempted to deploy a configuration change. This configuration change was not successful and resulted in an additional 9 minutes of downtime before being reverted.
As a result, over the course of the evening of April 20, the incident response team executed work to remove the database proxy from our infrastructure, completing their work at 11:00 PM EST. We have been operationally normal since this time.
Moving forward from this incident we are:
* Committed to additional testing of any database proxy infrastructure we add to our system in the future
* Improving alarms for internal services, that can serve as a warning sign for external availability issues
* Re-affirming our commitment to operational excellence.
Apr 20, 16:51:00 GMT+0
Resolved -
This incident has been resolved..
Apr 20, 16:43:00 GMT+0
Investigating -
We are currently investigating this incident..
Apr 20, 03:30:01 GMT+0
Identified -
Maintenance is now in progress.
Apr 20, 03:42:34 GMT+0
Completed -
Maintenance has completed successfully..
Apr 20, 03:30:00 GMT+0
Identified -
We are planning for a scheduled maintenance during this window. We expect the window to last 15 minutes.
During the window, new authorizations via Flexpa Link will not be possible, nor will data retrieval requests from Flexpa API..
Jan 31, 15:39:15 GMT+0
Resolved -
The test is complete..
Jan 31, 15:37:15 GMT+0
Investigating -
We are testing the integration of our logging provider and this status page..
Jan 20, 23:57:31 GMT+0
Investigating -
We are currently investigating an outage in our infrastructure provider impacting all services..
Jan 21, 00:02:48 GMT+0
Resolved -
The issue has been resolved..