500 Server Errors, unresponsive UI, API and Alerting / Mitigation outage
Incident Report for Kentik SaaS US Cluster
Resolved
This incident has been resolved.
Posted Nov 03, 2020 - 01:57 UTC
Monitoring
We have restored services and are now working on bringing up the remaining capacity for each service. At this time, we believe that full operability is restored and will monitoring the situation for the next 30-60 minutes.
Posted Nov 03, 2020 - 01:21 UTC
Update
We have identified the root cause to be situated with metadata database processing. We continue to restore services and are continuing to work on a full resolution.
Posted Nov 03, 2020 - 01:13 UTC
Update
We have identified the root cause to be situated with metadata database processing. We have partially restored services and are continuing to work on a full resolution. We will provide an update again in 10 minutes or sooner.
Posted Nov 03, 2020 - 01:02 UTC
Identified
We have identified the root cause to be situated with metadata database processing. We have partially restored services and are continuing to work on a full resolution. We will provide an update again in 10 minutes or sooner.
Posted Nov 03, 2020 - 00:52 UTC
Update
We are continuing to investigate this issue. We will provide an update again in 10 minutes or sooner.
Posted Nov 03, 2020 - 00:42 UTC
Update
We are currently investigating this issue. A preliminary potential workaround is being tested. We have determined that the scope of this outage also affects Flow Ingest and BGP Telemetry enrichment as of 0:22 UTC. We will provide an update again in 10 minutes or sooner.
Posted Nov 03, 2020 - 00:33 UTC
Update
We are continuing to investigate this issue. A preliminary potential workaround is being tested. We will provide an update again in 10 minutes or sooner.
Posted Nov 03, 2020 - 00:30 UTC
Update
We are continuing to investigate this issue. We will provide an update again in 10 minutes or sooner.
Posted Nov 03, 2020 - 00:21 UTC
Update
We are currently investigating this issue. We have determined that the scope of this outage also affects our API and Alerting and Mitigation Services. We will provide an update again in 10 minutes or sooner.
Posted Nov 03, 2020 - 00:11 UTC
Update
We are currently investigating this issue. We will provide an update again in 10 minutes or sooner.
Posted Nov 03, 2020 - 00:07 UTC
Update
We are continuing to investigate this issue.
Posted Nov 02, 2020 - 23:57 UTC
Investigating
We are currently investigating this issue.
Posted Nov 02, 2020 - 23:40 UTC
This incident affected: BGP (BGP Peering and Enrichment), Web Portal, REST API, Flow Ingest, Alerting and Mitigation Services, and Cloud Ingest (AWS Ingest, GCP Ingest, Azure Ingest).