API intermittent failures

Incident Report for Kentik SaaS US Cluster

Resolved

This incident has been resolved.
Posted Jun 04, 2020 - 21:06 UTC

Monitoring

An API node booted up with an erroneous static route, leading most requests going to it to timeout for the last hour. Kentik engineers detected the issue, removed the node from the pool and fixed the configuration, adding the node back to the pool.

Due to connections migrating between nodes to the next during the depooling and repooling events, any long-lasting connections might have ended abruptly and had to reconnect to the API.

Kentik engineers are now monitoring the implemented fix.

We apologize for any inconvenience.
Posted Jun 04, 2020 - 20:44 UTC
This incident affected: REST API.