Issues
Quickly understand what requires your attention and drive your investigations
The issues page is a useful place to start a troubleshooting or investigation flow from. It gathers together all active issues found in your Kubernetes environment.
Issue Types
HTTP / gRPC Failures Capturing failed HTTP calls with Response Status Codes of:
5XX
— Internal Server Error429
— Too Many Requests
MySQL / PostgreSQL Failures
Capturing failed SQL statement executions with Response Errors Codes such as:
1146
— No Such Table1040
— Too Many Connections1064
— Syntax Error
Redis Failures Capturing any reported Error by the Redis serialization protocol (RESP), such as:
ERR unknown command
Container Restarts Capturing all container restart events across the cluster, with Exit Codes such as:
0
— Completed137
— OOMKilled
Deployment Failures
Capturing events such as:
MinimumReplicasUnavailable
— Deployment does not have minimum availabiltiy
Issue Aggregation
Issues are auto-detected and aggregated - representing many identical repeating incidents. Aggregation help cutting through the noise quickly and reach insights like when a new type of issue started to appear, and when it was last seen.
Issues are grouped by:
Type (HTTP, gRPC, Container Restart, etc..)
Status Code / Error Code (e.g HTTP
500,
gRPC13
)Workload name
Namespace
The smart aggregation mechanism will also identify query parameters, remove them, and group the stripped queries / API URIs into patterns. This allows users to easily identify and isolate the root cause of a problem.
Troubleshooting with Issues
Each issue is assigned a velocity graph showing it's behavior over time (like when it was first seen) and a live counter of its number of incidents.
By clicking on an issue, users can access the specific traces captured around the relevant issue. Each trace is related to the exact resource that was used (e.g. raw API URI, or SQL query), it's latency and Status Code / Error Code.
Further clicking on a selected captured trace allows the user to investigate the root cause of the issue with the entire payload (body and headers) of the request and response, the information around the participating container, the application logs around incident's time and the full context of the metrics around the incident.
Last updated