Flow ingest outage for agent-based sources
Incident Report for Kentik SaaS US Cluster
Postmortem

ROOT CAUSE

A flow ingest code update was applied in this cluster around 23:00 UTC that caused our hot restart logic to put all agent-based ingest over https into a 10 minute hold down period. We immediately rolled back when we saw the drops in flow, which caused an additional 10 minute hold down period, after which our ingest capabilities returned to nominal state.

RESOLUTION

The offending bug in our hot restart logic has been resolved and will not recur in future ingest updates.

Posted Oct 25, 2022 - 16:55 UTC

Resolved
This incident has been resolved.
Posted Oct 25, 2022 - 00:00 UTC
Monitoring
The rollback is complete and we are continuing to monitor.
Posted Oct 24, 2022 - 23:30 UTC
Identified
Beginning at approximately 23:00 UTC, Kentik engineering rolled out a new ingest layer update and all agent-based ingest was adversely affected. We are rolling back.
Posted Oct 24, 2022 - 23:00 UTC
This incident affected: Flow Ingest.