Application Metrics

Our metrics philosophy

The groundcover platform generates 100% of its metrics from the actual data. There are no sample rates or complex interpolations to make up for partial coverage. Our measurements represent the real, complete flow of data in your environment.

Stream processing allows us to construct the majority of the metrics on the very node where the raw transactions are recorded. This means the raw data is turned into numbers the moment it becomes possible - removing the need for storing or sending it elsewhere.

Metrics are stored in groundcover's victoria-metrics deployment, ensuring top-notch performance on every scale.

Golden signals

In the world of excessive data, it's important to have a rule of thumb for knowing where to start looking. For application metrics, we rely on our golden signals.

The following metrics are generated for each resource being aggregated:

  • Requests per second (RPS)

  • Errors rate

  • Latencies (p50 and p95)

The golden signals are then displayed in two important ways: Workload and Resource aggregations.

See below for the full list of generated workload and resource golden metrics.

Resource aggregations are highly granularity metrics, providing insights into individual APIs.

Workload aggregations are designed to show an overview of each service, enabling a higher level inspection. These are constructed using all of the resources recorded for each service.

Controlling retention

groundcover allows full control over the retention of your metrics. Learn more here.

List of available metrics

Below you will find the full list of our APM metrics, as well as the labels we export for each. These labels are designed with high granularity in mind for maximal insight depth. All of the metrics listed are available out of the box after installing groundcover, without any further setup.

We fully support the ingestion of custom metrics to further expand the visibility into your environment.

We also allow for building custom dashboards, enabling full freedom in deciding how to display your metrics - building on groundcover's metrics below plus every custom metric ingested.

Our labels

Label nameDescriptionRelevant types

clusterId

Name identifier of the K8s cluster

region

Cloud provider region name

namespace

K8s namespace

workload_name

K8s workload (or service) name

pod_name

K8s pod name

container_name

K8s container name

container_image

K8s container image name

remote_namespace

Remote K8s namespace (other side of the communication)

remote_service_name

Remote K8s service name (other side of the communication)

remote_container_name

Remote K8s container name (other side of the communication)

type

The protocol in use (HTTP, gRPC, Kafka, DNS etc.)

role

Role in the communication (client or server)

clustered_path

HTTP / gRPC aggregated resource path (e.g. /metrics/*)

http, grpc

method

HTTP / gRPC method (e.g GET)

http, grpc

response_status_code

Return status code of a HTTP / gPRC request (e.g. 200 in HTTP)

http, grpc

dialect

SQL dialect (MySQL or PostgreSQL)

mysql, postgresql

response_status

Return status code of a SQL query (e.g 42P01 for undefined table)

mysql, postgresql

client_type

Kafka client type (Fetcher / Producer)

kafka

topic

Kafka topic name

kafka

partition

Kafka partition identifier

kafka

error_code

Kafka return status code

kafka

query_type

type of DNS query (e.g. AAAA)

dns

response_return_code

Return status code of a DNS resolution request (e.g. Name Error)

dns

method_name, method_class_name

Method code for the operation

amqp

response_method_name, response_method_class_name

Method code for the operation's response

amqp

exit_code

K8s container termination exit code

container_state, container_crash

state

K8s container current state (Running, Waiting or Terminated)

container_state

state_reason

K8s container state transition reason (e.g CrashLoopBackOff or OOMKilled)

container_state

crash_reason

K8s container crash reason (e.g Error, OOMKilled)

container_crash

pvc_name

K8s PVC name

storage

Summary based metrics have an additional quantile label, representing the percentile. Available values: [”0.5”, “0.95”, 0.99”].

groundcover uses a set of internal labels which are not relevant in most use-cases. Find them interesting? Let us know over Slack!

issue_id entity_id resource_id query_id aggregation_id parent_entity_id perspective_entity_id perspective_entity_is_external perspective_entity_issue_id perspective_entity_name perspective_entity_namespace perspective_entity_resource_id

Golden Signals Metrics

In the lists below, we describe error and issue counters. Every issue flagged by groundcover is an error; but not every error is flagged as an issue.

Resource metrics

NameDescriptionType

groundcover_resource_total_counter

total amount of resource requests

Counter

groundcover_resource_error_counter

total amount of requests with error status codes

Counter

groundcover_resource_issue_counter

total amount of requests which were flagged as issues

Counter

groundcover_resource_success_counter

total amount of resource requests with OK status codes

Counter

groundcover_resource_latency_seconds

resource latency [sec]

Summary

Workload metrics

NameDescriptionType

groundcover_workload_total_counter

total amount of requests handled by the workload

Counter

groundcover_workload_error_counter

total amount of requests handled by the workload with error status codes

Counter

groundcover_workload_issue_counter

total amount of requests handled by the workload which were flagged as issues

Counter

groundcover_workload_success_counter

total amount of requests handled by the workload with OK status codes

Counter

groundcover_workload_latency_seconds

resource latency across all of the workload APIs [sec]

Summary

Storage usage metrics

NameDescriptionType

groundcover_pvc_read_bytes_total

total amount of bytes read by the workload from the PVC

Counter

groundcover_pvc_write_bytes_total

total amount of bytes written by the workload to the PVC

Counter

groundcover_pvc_reads_total

total amount of read operations done by the workload from the PVC

Counter

groundcover_pvc_writes_total

total amount of write operations done by the workload to the PVC

Counter

groundcover_pvc_read_latency

latency of read operation by the workload from the PVC, in microseconds

Summary

groundcover_pvc_write_latency

latency of write operation by the workload to the PVC, in microseconds

Summary

Kafka specific metrics

NameDescriptionType

groundcover_client_offset

client last message offset (for producer the last offset produced, for consumer the last requested offset)

Gauge

groundcover_workload_client_offset

client last message offset (for producer the last offset produced, for consumer the last requested offset), aggregated by workload

Gauge

groundcover_calc_lagged_messages

current lag in messages

Gauge

groundcover_workload_calc_lagged_messages

current lag in messages, aggregated by workload

Gauge

groundcover_calc_lag_seconds

current lag in time [sec]

Gauge

groundcover_workload_calc_lag_seconds

current lag in time, aggregated by workload [sec]

Gauge

Last updated