Metrics & Labels

Kubernetes Infrastructure Metrics & Labels

Container CPU and Memory

Labels

type

clusterId region namespace node_name workload_name

pod_name container_name container_image

Metrics

Name
Description
Unit
Type

groundcover_container_cpu_usage_rate_millis

CPU usage in mCPU

mCPU

Gauge

groundcover_container_cpu_request_m_cpu

K8s container CPU request

mCPU

Gauge

groundcover_container_cpu_limit_m_cpu

K8s container CPU limit

mCPU

Gauge

groundcover_container_memory_working_set_bytes

current memory working set

Bytes

Gauge

groundcover_container_memory_rss_bytes

current memory RSS

Bytes

Gauge

groundcover_container_memory_request_bytes

K8s container memory request

Bytes

Gauge

groundcover_container_memory_limit_bytes

K8s container memory limit

Bytes

Gauge

groundcover_container_cpu_delay_seconds

K8s container CPU delay

Seconds

Counter

groundcover_container_disk_delay_seconds

K8s container disk delay

Seconds

Counter

groundcover_container_cpu_throttled_seconds_total

K8s container total CPU throttling

Seconds

Counter

Node CPU, Memory and Disk

Labels

type clusterId region node_name

Metrics

Name
Description
Unit
Type

groundcover_node_allocatable_cpum_cpu

amount of allocatable CPU in the current node

mCPU

Gauge

groundcover_node_allocatable_mem_bytes

amount of allocatable memory in the current node

Bytes

Gauge

groundcover_node_mem_used_percent

percent of used memory in current node

0-100

Gauge

groundcover_node_used_disk_space

current used disk space in current node

Bytes

Gauge

groundcover_node_free_disk_space

amount of free disk space in current node

Bytes

Gauge

groundcover_node_total_disk_space

amount of total disk space in current node

Bytes

Gauge

groundcover_node_used_percent_disk_space

percent of used disk space in current node

0-100

Gauge

Storage Usage

Labels

type clusterId region name namespace

Metrics

Name
Description
Unit
Type

groundcover_pvc_usage_bytes

PVC usage

Bytes

Gauge

groundcover_pvc_capacity_bytes

PVC capacity

Bytes

Gauge

groundcover_pvc_available_bytes

PVC available

Bytes

Gauge

groundcover_pvc_usage_percent

percent of used PVC storage

0-100

Gauge

groundcover_pvc_read_bytes_total

total amount of bytes read by the workload from the PVC

Bytes

Counter

groundcover_pvc_write_bytes_total

total amount of bytes written by the workload to the PVC

Bytes

Counter

groundcover_pvc_reads_total

total amount of read operations done by the workload from the PVC

Number

Counter

groundcover_pvc_writes_total

total amount of write operations done by the workload to the PVC

Number

Counter

groundcover_pvc_read_latency

latency of read operation by the workload from the PVC

Seconds

Summary

groundcover_pvc_write_latency

latency of write operation by the workload to the PVC

Seconds

Summary

Network Usage

Labels

clusterId workload_name namespace container_name remote_service_name remote_namespace remote_is_external availability_zone region remote_availability_zone remote_region is_cross_az protocol role server_port encryption transport_protocol is_loopback

Notes:

  • is_loopback and remote_is_external are special labels that indicate the remote service is either the same service as the recording side (loopback) or resides in an external network, e.g managed service outside of the cluster (external).

    • In both cases the remote_service_name and the remote_namespace labels will be empty

  • is_cross_az means the traffic was sent and/or received between two different availability zones. This is a helpful flag to quickly identify this special kind of communication.

    • The actual zones are detailed in the availability_zone and remote_availability_zone labels

Metrics

Name
Description
Unit
Type

groundcover_network_rx_bytes_total

Bytes received by the workload

Bytes

Counter

groundcover_network_tx_bytes_total

Bytes sent by the workload

Bytes

Counter

groundcover_network_connections_opened_total

Connections opened by the workload

Number

Counter

groundcover_network_connections_closed_total

Connections closed by the workload

Number

Counter

groundcover_network_connections_opened_failed_total

Connections attempts failed per workload (including refused connections)

Number

Counter

groundcover_network_connections_refused_failed_total

Connections attempts refused per workload

Number

Counter

Host Resources Metrics & Labels

Host CPU

Labels

clusterId env region host_name cloud_provider env_type

Metrics

Name
Description
Unit
Type

groundcover_host_uptime_seconds

uptime of the current host

Seconds

Gauge

groundcover_host_cpu_capacity_m_cpu

CPU capacity in the current host

mCPU

Gauge

groundcover_host_cpu_usage_m_cpu

CPU usage in the current host

mCPU

Gauge

groundcover_host_cpu_usage_percent

percentage of used CPU in the current host

0-100

Gauge

groundcover_host_cpu_num_cores

number of CPU cores on the host

Number

Gauge

groundcover_host_cpu_user_spent_seconds_total

total time spent in user mode

Seconds

Counter

groundcover_host_cpu_user_spent_percent

percentage of CPU time spent in user mode

0-100

Gauge

groundcover_host_cpu_system_spent_seconds_total

total time spent in system mode

Seconds

Counter

groundcover_host_cpu_system_spent_percent

percentage of CPU time spent in system mode

0-100

Gauge

groundcover_host_cpu_idle_spent_seconds_total

total time spent idle

Seconds

Counter

groundcover_host_cpu_idle_spent_percent

percentage of CPU time spent idle

0-100

Gauge

groundcover_host_cpu_iowait_spent_seconds_total

total time spent waiting for I/O to complete

Seconds

Counter

groundcover_host_cpu_iowait_spent_percent

percentage of CPU time spent waiting for I/O

0-100

Gauge

groundcover_host_cpu_nice_spent_seconds_total

total time spent on niced processes

Seconds

Counter

groundcover_host_cpu_steal_spent_seconds_total

total time spent in involuntary wait (stolen by hypervisor)

Seconds

Counter

groundcover_host_cpu_stolen_spent_percent

percentage of CPU time stolen by the hypervisor

0-100

Gauge

groundcover_host_cpu_irq_spent_seconds_total

total time spent handling hardware interrupts

Seconds

Counter

groundcover_host_cpu_softirq_spent_seconds_total

total time spent handling software interrupts

Seconds

Counter

groundcover_host_cpu_interrupt_spent_percent

percentage of CPU time spent handling interrupts

0-100

Gauge

groundcover_host_cpu_guest_spent_seconds_total

total time spent running guest processes

Seconds

Counter

groundcover_host_cpu_guest_spent_percent

percentage of CPU time spent running guest processes

0-100

Gauge

groundcover_host_cpu_guest_nice_spent_seconds_total

total time spent running niced guest processes

Seconds

Counter

groundcover_host_cpu_context_switches_total

total number of context switches in the current host

Number

Counter

groundcover_host_cpu_load_avg1

CPU load average over 1 minute

Number

Gauge

groundcover_host_cpu_load_avg5

CPU load average over 5 minutes

Number

Gauge

groundcover_host_cpu_load_avg15

CPU load average over 15 minutes

Number

Gauge

groundcover_host_cpu_load_norm1

normalized CPU load over 1 minute

Number

Gauge

groundcover_host_cpu_load_norm5

normalized CPU load over 5 minutes

Number

Gauge

groundcover_host_cpu_load_norm15

normalized CPU load over 15 minutes

Number

Gauge

Host Memory

Labels

clusterId env region host_name cloud_provider env_type

Metrics

Name
Description
Unit
Type

groundcover_host_mem_capacity_bytes

memory capacity in the current host

Bytes

Gauge

groundcover_host_mem_used_bytes

memory used in the current host

Bytes

Gauge

groundcover_host_mem_used_percent

percentage of used memory in the current host

0-100

Gauge

groundcover_host_mem_free_bytes

free memory in the current host

Bytes

Gauge

groundcover_host_mem_available_bytes

available memory in the current host

Bytes

Gauge

groundcover_host_mem_cached_bytes

cached memory in the current host

Bytes

Gauge

groundcover_host_mem_buffers_bytes

buffer memory in the current host

Bytes

Gauge

groundcover_host_mem_shared_bytes

shared memory in the current host

Bytes

Gauge

groundcover_host_mem_slab_bytes

slab memory in the current host

Bytes

Gauge

groundcover_host_mem_sreclaimable_bytes

reclaimable slab memory in the current host

Bytes

Gauge

groundcover_host_mem_page_tables_bytes

page tables memory in the current host

Bytes

Gauge

groundcover_host_mem_commit_limit_bytes

memory commit limit in the current host

Bytes

Gauge

groundcover_host_mem_committed_as_bytes

committed address space memory in the current host

Bytes

Gauge

groundcover_host_mem_swap_cached_bytes

cached swap memory in the current host

Bytes

Gauge

groundcover_host_mem_swap_total_bytes

total swap memory in the current host

Bytes

Gauge

groundcover_host_mem_swap_free_bytes

free swap memory in the current host

Bytes

Gauge

groundcover_host_mem_swap_used_bytes

used swap memory in the current host

Bytes

Gauge

groundcover_host_mem_swap_in_bytes_total

swap in bytes in the current host

Bytes

Counter

groundcover_host_mem_swap_out_bytes_total

swap out bytes in the current host

Bytes

Counter

Host Disk

Labels

clusterId env region host_name cloud_provider env_type Optional: device_name

Metrics

Name
Description
Unit
Type

groundcover_host_disk_space_used_bytes

used disk space in the current host

Bytes

Gauge

groundcover_host_disk_space_free_bytes

free disk space in the current host

Bytes

Gauge

groundcover_host_disk_space_total_bytes

total disk space in the current host

Bytes

Gauge

groundcover_host_disk_space_used_percent

percentage of used disk space in the current host

0-100

Gauge

groundcover_host_disk_read_time_ms_total

total time spent reading from disk per device in the current host

Milliseconds

Counter

groundcover_host_disk_write_time_ms_total

total time spent writing to disk per device in the current host

Milliseconds

Counter

groundcover_host_disk_read_count_total

total number of disk reads per device in the current host

Number

Counter

groundcover_host_disk_write_count_total

total number of disk writes per device in the current host

Number

Counter

groundcover_host_disk_merged_read_count_total

total number of merged disk reads per device in the current host

Number

Counter

groundcover_host_disk_merged_write_count_total

total number of merged disk writes per device in the current host

Number

Counter

Host I/O

Labels

clusterId env region host_name cloud_provider env_type Optional: device_name

Metrics

Name
Description
Unit
Type

groundcover_host_io_read_kb_per_sec

disk read throughput per device in the current host

KB/s

Gauge

groundcover_host_io_write_kb_per_sec

disk write throughput per device in the current host

KB/s

Gauge

groundcover_host_io_read_await_ms

average time for read requests to be served per device in the current host

Milliseconds

Gauge

groundcover_host_io_write_await_ms

average time for write requests to be served per device in the current host

Milliseconds

Gauge

groundcover_host_io_await_ms

average time for I/O requests to be served per device in the current host

Milliseconds

Gauge

groundcover_host_io_avg_request_size

average I/O request size per device in the current host

Kilobytes

Gauge

groundcover_host_io_service_time_ms

average service time for I/O requests per device in the current host

Milliseconds

Gauge

groundcover_host_io_avg_queue_size_kb

average I/O queue size per device in the current host

Kilobytes

Gauge

groundcover_host_io_utilization_percent

percentage of time the device was busy serving I/O requests in the current host

0-100

Gauge

groundcover_host_io_block_in_total

total number of block in the current host

Number

Counter

groundcover_host_io_block_out_total

total number of block out in the current host

Number

Counter

Host Filesystem

Labels

clusterId env region host_name cloud_provider env_type device_name file_system mountpoint

Metrics

Name
Description
Unit
Type

groundcover_host_fs_used_bytes

used filesystem space in the current host

Bytes

Gauge

groundcover_host_fs_free_bytes

free filesystem space in the current host

Bytes

Gauge

groundcover_host_fs_total_bytes

total filesystem space in the current host

Bytes

Gauge

groundcover_host_fs_used_percent

percentage of used filesystem space in the current host

0-100

Gauge

groundcover_host_fs_inodes_total

total inodes in the filesystem

Number

Gauge

groundcover_host_fs_inodes_used

used inodes in the filesystem

Number

Gauge

groundcover_host_fs_inodes_free

free inodes in the filesystem

Number

Gauge

groundcover_host_fs_inodes_used_percent

percentage of used inodes in the filesystem

0-100

Gauge

Host File Handles

Labels

clusterId env region host_name cloud_provider env_type

Metrics

Name
Description
Unit
Type

groundcover_host_fs_file_handles_allocated

total number of file handles allocated in the current host

Number

Gauge

groundcover_host_fs_file_handles_allocated_unused

number of allocated but unused file handles in the current host

Number

Gauge

groundcover_host_fs_file_handles_in_use

number of file handles currently in use in the current host

Number

Gauge

groundcover_host_fs_file_handles_used_percent

percentage of file handles in use in the current host

0-100

Gauge

groundcover_host_fs_file_handles_max

maximum number of file handles available in the current host

Number

Gauge

Host Network

Labels

clusterId env region host_name cloud_provider env_type device

Metrics

Name
Description
Unit
Type

groundcover_host_net_receive_bytes_total

total bytes received on network interface

Bytes

Counter

groundcover_host_net_transmit_bytes_total

total bytes transmitted on network interface

Bytes

Counter

groundcover_host_net_receive_packets_total

total packets received on network interface

Number

Counter

groundcover_host_net_transmit_packets_total

total packets transmitted on network interface

Number

Counter

groundcover_host_net_receive_dropped_total

total number of received packets dropped on network interface

Number

Counter

groundcover_host_net_receive_errors_total

total number of receive errors on network interface

Number

Counter

groundcover_host_net_transmit_dropped_total

total number of transmitted packets dropped on network interface

Number

Counter

groundcover_host_net_transmit_errors_total

total number of transmit errors on network interface

Number

Counter

Application Metrics & Labels

Label name
Description
Relevant types

clusterId

Name identifier of the K8s cluster

All

region

Cloud provider region name

All

namespace

K8s namespace

All

workload_name

K8s workload (or service) name

All

pod_name

K8s pod name

All

container_name

K8s container name

All

container_image

K8s container image name

All

remote_namespace

Remote K8s namespace (other side of the communication)

All

remote_service_name

Remote K8s service name (other side of the communication)

All

remote_container_name

Remote K8s container name (other side of the communication)

All

type

The protocol in use (HTTP, gRPC, Kafka, DNS etc.)

All

sub_type

The sub type of the protocol (GET, POST, etc)

All

role

Role in the communication (client or server)

All

clustered_resource_name

The clustered name of the resource, depends on the protocol

All

status_code

"ok", "error" or "unset"

All

server

The server workload/name

All

client

The client workload/name

All

server_namesapce

The server namespace

All

client_namespace

The client namespace

All

server_is_external

Indicate whether the server is external

All

client_is_external

Indicate wheter the client is external

All

is_encrypted

Indicate whether the communication is encrypted

All

is_cross_az

Indicate wether the communication is cross availability zone

All

clustered_path

HTTP / gRPC aggregated resource path (e.g. /metrics/*)

http, grpc

method

HTTP / gRPC method (e.g GET)

http, grpc

response_status_code

Return status code of a HTTP / gPRC request (e.g. 200 in HTTP)

http, grpc

dialect

SQL dialect (MySQL or PostgreSQL)

mysql, postgresql

response_status

Return status code of a SQL query (e.g 42P01 for undefined table)

mysql, postgresql

client_type

Kafka client type (Fetcher / Producer)

kafka

topic

Kafka topic name

kafka

partition

Kafka partition identifier

kafka

error_code

Kafka return status code

kafka

query_type

type of DNS query (e.g. AAAA)

dns

response_return_code

Return status code of a DNS resolution request (e.g. Name Error)

dns

exit_code

K8s container termination exit code

container_state, container_crash

state

K8s container current state (Running, Waiting or Terminated)

container_state

state_reason

K8s container state transition reason (e.g CrashLoopBackOff or OOMKilled)

container_state

crash_reason

K8s container crash reason (e.g Error, OOMKilled)

container_crash

pvc_name

K8s PVC name

storage

Summary based metrics have an additional quantile label, representing the percentile. Available values: [”0.5”, “0.95”, 0.99”].

Golden Signals (Errors & Issues)

In the lists below, we describe error and issue counters. Every issue flagged by the platform is an error; but not every error is flagged as an issue.

Resource metrics

Name
Description
Unit
Type

groundcover_resource_total_counter

total amount of resource requests

Number

Counter

groundcover_resource_error_counter

total amount of requests with error status codes

Number

Counter

groundcover_resource_issue_counter

total amount of requests which were flagged as issues

Number

Counter

groundcover_resource_success_counter

total amount of resource requests with OK status codes

Number

Counter

groundcover_resource_latency_seconds

resource latency

Seconds

Summary

Workload metrics

Name
Description
Unit
Type

groundcover_workload_total_counter

total amount of requests handled by the workload

Number

Counter

groundcover_workload_error_counter

total amount of requests handled by the workload with error status codes

Number

Counter

groundcover_workload_issue_counter

total amount of requests handled by the workload which were flagged as issues

Number

Counter

groundcover_workload_success_counter

total amount of requests handled by the workload with OK status codes

Number

Counter

groundcover_workload_latency_seconds

resource latency across all of the workload APIs

Seconds

Summary

Kafka specific metrics

Name
Description
Unit
Type

groundcover_client_offset

client last message offset (for producer the last offset produced, for consumer the last requested offset)

Gauge

groundcover_workload_client_offset

client last message offset (for producer the last offset produced, for consumer the last requested offset), aggregated by workload

Gauge

groundcover_calc_lagged_messages

current lag in messages

Number

Gauge

groundcover_workload_calc_lagged_messages

current lag in messages, aggregated by workload

Number

Gauge

groundcover_calc_lag_seconds

current lag in time

Seconds

Gauge

groundcover_workload_calc_lag_seconds

current lag in time, aggregated by workload

Seconds

Gauge

Last updated