APIv6 and some Kentik agent APIs down
Incident Report for Kentik SaaS US Cluster
Postmortem

ROOT CAUSE

Our inbound proxy/loadbalancer was configured for a small concurrent connection pool for these API paths. At approximately 13:30 UTC, a few high volume paths were brought online that filled this pool and caused requests to queue to a point where we could not catch up. This caused periodic 503 and 429 responses from our API.

RESOLUTION

At approximately 17:45 UTC, the connection pool size was increased to address this issue.

We have raised the severity of the internal alerts we have monitoring these metrics to more quickly identify and resolve future similar events.

Posted Oct 25, 2022 - 16:48 UTC

Resolved
This incident has been resolved.
Posted Oct 20, 2022 - 21:15 UTC
Monitoring
The issue has been resolved and we will continue to monitor
Posted Oct 20, 2022 - 20:30 UTC
Update
A fix has been issued and engineering continues to monitor the API
Posted Oct 20, 2022 - 20:05 UTC
Identified
The root cause has been identified and a fix is being put into place.
Posted Oct 20, 2022 - 19:50 UTC
Investigating
At approximately 1530UTC, we saw elevated error levels for API v6 (synthetics, cloud, etc.) and Kentik agent (kbgp, kproxy) APIs. We have determined these APIs to be non-operational and are investigating the root cause.
Posted Oct 20, 2022 - 13:30 UTC
This incident affected: REST API.