Metrics & Labels
Infrastructure Metrics & Labels
Container CPU and Memory
Labels
type
clusterId
region
namespace
node_name
workload_name
pod_name
container_name
container_image
Metrics
Name | Description | Unit | Type |
---|---|---|---|
groundcover_container_cpu_usage_rate_millis | CPU usage in mCPU | mCPU | Gauge |
groundcover_container_cpu_request_m_cpu | K8s container CPU request | mCPU | Gauge |
groundcover_container_cpu_limit_m_cpu | K8s container CPU limit | mCPU | Gauge |
groundcover_container_memory_working_set_bytes | current memory working set | Bytes | Gauge |
groundcover_container_memory_rss_bytes | current memory RSS | Bytes | Gauge |
groundcover_container_memory_request_bytes | K8s container memory request | Bytes | Gauge |
groundcover_container_memory_limit_bytes | K8s container memory limit | Bytes | Gauge |
groundcover_container_cpu_delay_seconds | K8s container CPU delay | Seconds | Counter |
groundcover_container_disk_delay_seconds | K8s container disk delay | Seconds | Counter |
groundcover_container_cpu_throttled_seconds_total | K8s container total CPU throttling | Seconds | Counter |
Node CPU, Memory and Disk
Labels
type
clusterId
region
node_name
Metrics
Name | Description | Unit | Type |
---|---|---|---|
groundcover_node_allocatable_cpum_cpu | amount of allocatable CPU in the current node | mCPU | Gauge |
groundcover_node_allocatable_mem_bytes | amount of allocatable memory in the current node | Bytes | Gauge |
groundcover_node_mem_used_percent | percent of used memory in current node | 0-100 | Gauge |
groundcover_node_used_disk_space | current used disk space in current node | Bytes | Gauge |
groundcover_node_free_disk_space | amount of free disk space in current node | Bytes | Gauge |
groundcover_node_total_disk_space | amount of total disk space in current node | Bytes | Gauge |
groundcover_node_used_percent_disk_space | percent of used disk space in current node | 0-100 | Gauge |
Storage Usage
Labels
type
clusterId
region
name
namespace
Metrics
Name | Description | Unit | Type |
---|---|---|---|
groundcover_pvc_usage_bytes | PVC usage | Bytes | Gauge |
groundcover_pvc_capacity_bytes | PVC capacity | Bytes | Gauge |
groundcover_pvc_available_bytes | PVC available | Bytes | Gauge |
groundcover_pvc_usage_percent | percent of used PVC storage | 0-100 | Gauge |
groundcover_pvc_read_bytes_total | total amount of bytes read by the workload from the PVC | Bytes | Counter |
groundcover_pvc_write_bytes_total | total amount of bytes written by the workload to the PVC | Bytes | Counter |
groundcover_pvc_reads_total | total amount of read operations done by the workload from the PVC | Number | Counter |
groundcover_pvc_writes_total | total amount of write operations done by the workload to the PVC | Number | Counter |
groundcover_pvc_read_latency | latency of read operation by the workload from the PVC | Seconds | Summary |
groundcover_pvc_write_latency | latency of write operation by the workload to the PVC | Seconds | Summary |
Network Usage
Labels
clusterId workload_name
namespace
container_name
remote_service_name
remote_namespace
remote_is_external
availability_zone
region
remote_availability_zone
remote_region
is_cross_az
protocol
role
server_port
encryption
transport_protocol
is_loopback
Notes:
is_loopback
andremote_is_external
are special labels that indicate the remote service is either the same service as the recording side (loopback) or resides in an external network, e.g managed service outside of the cluster (external).In both cases the
remote_service_name
and theremote_namespace
labels will be empty
is_cross_az
means the traffic was sent and/or received between two different availability zones. This is a helpful flag to quickly identify this special kind of communication.The actual zones are detailed in the
availability_zone
andremote_availability_zone
labels
Metrics
Name | Description | Unit | Type |
---|---|---|---|
groundcover_network_rx_bytes_total | Bytes received by the workload | Bytes | Counter |
groundcover_network_tx_bytes_total | Bytes sent by the workload | Bytes | Counter |
groundcover_network_connections_opened_total | Connections opened by the workload | Number | Counter |
groundcover_network_connections_closed_total | Connections closed by the workload | Number | Counter |
groundcover_network_connections_opened_failed_total | Connections attempts failed per workload (including refused connections) | Number | Counter |
groundcover_network_connections_refused_failed_total | Connections attempts refused per workload | Number | Counter |
Application Metrics & Labels
Label name | Description | Relevant types |
---|---|---|
clusterId | Name identifier of the K8s cluster | All |
region | Cloud provider region name | All |
namespace | K8s namespace | All |
workload_name | K8s workload (or service) name | All |
pod_name | K8s pod name | All |
container_name | K8s container name | All |
container_image | K8s container image name | All |
remote_namespace | Remote K8s namespace (other side of the communication) | All |
remote_service_name | Remote K8s service name (other side of the communication) | All |
remote_container_name | Remote K8s container name (other side of the communication) | All |
type | The protocol in use (HTTP, gRPC, Kafka, DNS etc.) | All |
sub_type | The sub type of the protocol (GET, POST, etc) | All |
role | Role in the communication (client or server) | All |
clustered_resource_name | The clustered name of the resource, depends on the protocol | All |
status_code | "ok", "error" or "unset" | All |
server | The server workload/name | All |
client | The client workload/name | All |
server_namesapce | The server namespace | All |
client_namespace | The client namespace | All |
server_is_external | Indicate whether the server is external | All |
client_is_external | Indicate wheter the client is external | All |
is_encrypted | Indicate whether the communication is encrypted | All |
is_cross_az | Indicate wether the communication is cross availability zone | All |
clustered_path | HTTP / gRPC aggregated resource path (e.g. /metrics/*) | http, grpc |
method | HTTP / gRPC method (e.g GET) | http, grpc |
response_status_code | Return status code of a HTTP / gPRC request (e.g. 200 in HTTP) | http, grpc |
dialect | SQL dialect (MySQL or PostgreSQL) | mysql, postgresql |
response_status | Return status code of a SQL query (e.g 42P01 for undefined table) | mysql, postgresql |
client_type | Kafka client type (Fetcher / Producer) | kafka |
topic | Kafka topic name | kafka |
partition | Kafka partition identifier | kafka |
error_code | Kafka return status code | kafka |
query_type | type of DNS query (e.g. AAAA) | dns |
response_return_code | Return status code of a DNS resolution request (e.g. Name Error) | dns |
exit_code | K8s container termination exit code | container_state, container_crash |
state | K8s container current state (Running, Waiting or Terminated) | container_state |
state_reason | K8s container state transition reason (e.g CrashLoopBackOff or OOMKilled) | container_state |
crash_reason | K8s container crash reason (e.g Error, OOMKilled) | container_crash |
pvc_name | K8s PVC name | storage |
Summary based metrics have an additional quantile label, representing the percentile. Available values: [”0.5”, “0.95”, 0.99”
].
We also use a set of internal labels which are not relevant in most use-cases. Find them interesting? Let us know over Slack!
issue_id
entity_id
resource_id
query_id
aggregation_id
parent_entity_id
perspective_entity_id
perspective_entity_is_external
perspective_entity_issue_id
perspective_entity_name
perspective_entity_namespace
perspective_entity_resource_id
Golden Signals (Errors & Issues)
In the lists below, we describe error and issue counters. Every issue flagged by the platform is an error; but not every error is flagged as an issue.
Resource metrics
Name | Description | Unit | Type |
---|---|---|---|
groundcover_resource_total_counter | total amount of resource requests | Number | Counter |
groundcover_resource_error_counter | total amount of requests with error status codes | Number | Counter |
groundcover_resource_issue_counter | total amount of requests which were flagged as issues | Number | Counter |
groundcover_resource_success_counter | total amount of resource requests with OK status codes | Number | Counter |
groundcover_resource_latency_seconds | resource latency | Seconds | Summary |
Workload metrics
Name | Description | Unit | Type |
---|---|---|---|
groundcover_workload_total_counter | total amount of requests handled by the workload | Number | Counter |
groundcover_workload_error_counter | total amount of requests handled by the workload with error status codes | Number | Counter |
groundcover_workload_issue_counter | total amount of requests handled by the workload which were flagged as issues | Number | Counter |
groundcover_workload_success_counter | total amount of requests handled by the workload with OK status codes | Number | Counter |
groundcover_workload_latency_seconds | resource latency across all of the workload APIs | Seconds | Summary |
Kafka specific metrics
Name | Description | Unit | Type |
---|---|---|---|
groundcover_client_offset | client last message offset (for producer the last offset produced, for consumer the last requested offset) | Gauge | |
groundcover_workload_client_offset | client last message offset (for producer the last offset produced, for consumer the last requested offset), aggregated by workload | Gauge | |
groundcover_calc_lagged_messages | current lag in messages | Number | Gauge |
groundcover_workload_calc_lagged_messages | current lag in messages, aggregated by workload | Number | Gauge |
groundcover_calc_lag_seconds | current lag in time | Seconds | Gauge |
groundcover_workload_calc_lag_seconds | current lag in time, aggregated by workload | Seconds | Gauge |
Last updated