Application Metrics
Last updated
Last updated
The groundcover platform generates 100% of its metrics from the actual data. There are no sample rates or complex interpolations to make up for partial coverage. Our measurements represent the real, complete flow of data in your environment.
Stream processing allows us to construct the majority of the metrics on the very node where the raw transactions are recorded. This means the raw data is turned into numbers the moment it becomes possible - removing the need for storing or sending it elsewhere.
Metrics are stored in groundcover's victoria-metrics
deployment, ensuring top-notch performance on every scale.
In the world of excessive data, it's important to have a rule of thumb for knowing where to start looking. For application metrics, we rely on our golden signals.
The following metrics are generated for each resource being aggregated:
Requests per second (RPS)
Errors rate
Latencies (p50 and p95)
The golden signals are then displayed in two important ways: Workload and Resource aggregations.
See below for the full list of generated workload and resource golden metrics.
Resource aggregations are highly granularity metrics, providing insights into individual APIs.
Workload aggregations are designed to show an overview of each service, enabling a higher level inspection. These are constructed using all of the resources recorded for each service.
groundcover allows full control over the retention of your metrics. Learn more here.
Below you will find the full list of our APM metrics, as well as the labels we export for each. These labels are designed with high granularity in mind for maximal insight depth. All of the metrics listed are available out of the box after installing groundcover, without any further setup.
We fully support the ingestion of custom metrics to further expand the visibility into your environment.
We also allow for building custom dashboards, enabling full freedom in deciding how to display your metrics - building on groundcover's metrics below plus every custom metric ingested.
Label name | Description | Relevant types |
---|---|---|
clusterId | Name identifier of the K8s cluster | |
region | Cloud provider region name | |
namespace | K8s namespace | |
workload_name | K8s workload (or service) name | |
pod_name | K8s pod name | |
container_name | K8s container name | |
container_image | K8s container image name | |
remote_namespace | Remote K8s namespace (other side of the communication) | |
remote_service_name | Remote K8s service name (other side of the communication) | |
remote_container_name | Remote K8s container name (other side of the communication) | |
type | The protocol in use (HTTP, gRPC, Kafka, DNS etc.) | |
role | Role in the communication (client or server) | |
clustered_path | HTTP / gRPC aggregated resource path (e.g. /metrics/*) | http, grpc |
method | HTTP / gRPC method (e.g GET) | http, grpc |
response_status_code | Return status code of a HTTP / gPRC request (e.g. 200 in HTTP) | http, grpc |
dialect | SQL dialect (MySQL or PostgreSQL) | mysql, postgresql |
response_status | Return status code of a SQL query (e.g 42P01 for undefined table) | mysql, postgresql |
client_type | Kafka client type (Fetcher / Producer) | kafka |
topic | Kafka topic name | kafka |
partition | Kafka partition identifier | kafka |
error_code | Kafka return status code | kafka |
query_type | type of DNS query (e.g. AAAA) | dns |
response_return_code | Return status code of a DNS resolution request (e.g. Name Error) | dns |
method_name, method_class_name | Method code for the operation | amqp |
response_method_name, response_method_class_name | Method code for the operation's response | amqp |
exit_code | K8s container termination exit code | container_state, container_crash |
state | K8s container current state (Running, Waiting or Terminated) | container_state |
state_reason | K8s container state transition reason (e.g CrashLoopBackOff or OOMKilled) | container_state |
crash_reason | K8s container crash reason (e.g Error, OOMKilled) | container_crash |
pvc_name | K8s PVC name | storage |
Summary based metrics have an additional quantile label, representing the percentile. Available values: [”0.5”, “0.95”, 0.99”
].
groundcover uses a set of internal labels which are not relevant in most use-cases. Find them interesting? Let us know over Slack!
issue_id
entity_id
resource_id
query_id
aggregation_id
parent_entity_id
perspective_entity_id
perspective_entity_is_external
perspective_entity_issue_id
perspective_entity_name
perspective_entity_namespace
perspective_entity_resource_id
In the lists below, we describe error and issue counters. Every issue flagged by groundcover is an error; but not every error is flagged as an issue.
Name | Description | Type |
---|---|---|
groundcover_resource_total_counter | total amount of resource requests | Counter |
groundcover_resource_error_counter | total amount of requests with error status codes | Counter |
groundcover_resource_issue_counter | total amount of requests which were flagged as issues | Counter |
groundcover_resource_success_counter | total amount of resource requests with OK status codes | Counter |
groundcover_resource_latency_seconds | resource latency [sec] | Summary |
Name | Description | Type |
---|---|---|
groundcover_workload_total_counter | total amount of requests handled by the workload | Counter |
groundcover_workload_error_counter | total amount of requests handled by the workload with error status codes | Counter |
groundcover_workload_issue_counter | total amount of requests handled by the workload which were flagged as issues | Counter |
groundcover_workload_success_counter | total amount of requests handled by the workload with OK status codes | Counter |
groundcover_workload_latency_seconds | resource latency across all of the workload APIs [sec] | Summary |
Name | Description | Type |
---|---|---|
groundcover_pvc_read_bytes_total | total amount of bytes read by the workload from the PVC | Counter |
groundcover_pvc_write_bytes_total | total amount of bytes written by the workload to the PVC | Counter |
groundcover_pvc_reads_total | total amount of read operations done by the workload from the PVC | Counter |
groundcover_pvc_writes_total | total amount of write operations done by the workload to the PVC | Counter |
groundcover_pvc_read_latency | latency of read operation by the workload from the PVC, in microseconds | Summary |
groundcover_pvc_write_latency | latency of write operation by the workload to the PVC, in microseconds | Summary |
Name | Description | Type |
---|---|---|
groundcover_client_offset | client last message offset (for producer the last offset produced, for consumer the last requested offset) | Gauge |
groundcover_workload_client_offset | client last message offset (for producer the last offset produced, for consumer the last requested offset), aggregated by workload | Gauge |
groundcover_calc_lagged_messages | current lag in messages | Gauge |
groundcover_workload_calc_lagged_messages | current lag in messages, aggregated by workload | Gauge |
groundcover_calc_lag_seconds | current lag in time [sec] | Gauge |
groundcover_workload_calc_lag_seconds | current lag in time, aggregated by workload [sec] | Gauge |