Metrics & Labels
Kubernetes Infrastructure Metrics & Labels
Container CPU and Memory
Labels
type
clusterId region namespace node_name workload_name
pod_name container_name container_image
Metrics
groundcover_container_cpu_usage_rate_millis
CPU usage in mCPU
mCPU
groundcover_container_cpu_request_m_cpu
K8s container CPU request
mCPU
groundcover_container_cpu_limit_m_cpu
K8s container CPU limit
mCPU
groundcover_container_memory_working_set_bytes
current memory working set
Bytes
groundcover_container_memory_rss_bytes
current memory RSS
Bytes
groundcover_container_memory_request_bytes
K8s container memory request
Bytes
groundcover_container_memory_limit_bytes
K8s container memory limit
Bytes
groundcover_container_cpu_delay_seconds
K8s container CPU delay
Seconds
groundcover_container_disk_delay_seconds
K8s container disk delay
Seconds
groundcover_container_cpu_throttled_seconds_total
K8s container total CPU throttling
Seconds
Node CPU, Memory and Disk
Labels
type clusterId region node_name
Metrics
groundcover_node_allocatable_cpum_cpu
amount of allocatable CPU in the current node
mCPU
groundcover_node_allocatable_mem_bytes
amount of allocatable memory in the current node
Bytes
groundcover_node_mem_used_percent
percent of used memory in current node
0-100
groundcover_node_used_disk_space
current used disk space in current node
Bytes
groundcover_node_free_disk_space
amount of free disk space in current node
Bytes
groundcover_node_total_disk_space
amount of total disk space in current node
Bytes
groundcover_node_used_percent_disk_space
percent of used disk space in current node
0-100
Storage Usage
Labels
type clusterId region name namespace
Metrics
groundcover_pvc_usage_bytes
PVC usage
Bytes
groundcover_pvc_capacity_bytes
PVC capacity
Bytes
groundcover_pvc_available_bytes
PVC available
Bytes
groundcover_pvc_usage_percent
percent of used PVC storage
0-100
groundcover_pvc_read_bytes_total
total amount of bytes read by the workload from the PVC
Bytes
groundcover_pvc_write_bytes_total
total amount of bytes written by the workload to the PVC
Bytes
groundcover_pvc_reads_total
total amount of read operations done by the workload from the PVC
Number
groundcover_pvc_writes_total
total amount of write operations done by the workload to the PVC
Number
groundcover_pvc_read_latency
latency of read operation by the workload from the PVC
Seconds
groundcover_pvc_write_latency
latency of write operation by the workload to the PVC
Seconds
Network Usage
Labels
clusterId workload_name namespace container_name remote_service_name remote_namespace remote_is_external availability_zone region remote_availability_zone remote_region is_cross_az protocol role server_port encryption transport_protocol is_loopback
Notes:
is_loopbackandremote_is_externalare special labels that indicate the remote service is either the same service as the recording side (loopback) or resides in an external network, e.g managed service outside of the cluster (external).In both cases the
remote_service_nameand theremote_namespacelabels will be empty
is_cross_azmeans the traffic was sent and/or received between two different availability zones. This is a helpful flag to quickly identify this special kind of communication.The actual zones are detailed in the
availability_zoneandremote_availability_zonelabels
Metrics
groundcover_network_rx_bytes_total
Bytes received by the workload
Bytes
groundcover_network_tx_bytes_total
Bytes sent by the workload
Bytes
groundcover_network_connections_opened_total
Connections opened by the workload
Number
groundcover_network_connections_closed_total
Connections closed by the workload
Number
groundcover_network_connections_opened_failed_total
Connections attempts failed per workload (including refused connections)
Number
groundcover_network_connections_refused_failed_total
Connections attempts refused per workload
Number
Host Resources Metrics & Labels
Host CPU
Labels
clusterId env region host_name cloud_provider env_type
Metrics
groundcover_host_uptime_seconds
uptime of the current host
Seconds
groundcover_host_cpu_capacity_m_cpu
CPU capacity in the current host
mCPU
groundcover_host_cpu_usage_m_cpu
CPU usage in the current host
mCPU
groundcover_host_cpu_usage_percent
percentage of used CPU in the current host
0-100
groundcover_host_cpu_num_cores
number of CPU cores on the host
Number
groundcover_host_cpu_user_spent_seconds_total
total time spent in user mode
Seconds
groundcover_host_cpu_user_spent_percent
percentage of CPU time spent in user mode
0-100
groundcover_host_cpu_system_spent_seconds_total
total time spent in system mode
Seconds
groundcover_host_cpu_system_spent_percent
percentage of CPU time spent in system mode
0-100
groundcover_host_cpu_idle_spent_seconds_total
total time spent idle
Seconds
groundcover_host_cpu_idle_spent_percent
percentage of CPU time spent idle
0-100
groundcover_host_cpu_iowait_spent_seconds_total
total time spent waiting for I/O to complete
Seconds
groundcover_host_cpu_iowait_spent_percent
percentage of CPU time spent waiting for I/O
0-100
groundcover_host_cpu_nice_spent_seconds_total
total time spent on niced processes
Seconds
groundcover_host_cpu_steal_spent_seconds_total
total time spent in involuntary wait (stolen by hypervisor)
Seconds
groundcover_host_cpu_stolen_spent_percent
percentage of CPU time stolen by the hypervisor
0-100
groundcover_host_cpu_irq_spent_seconds_total
total time spent handling hardware interrupts
Seconds
groundcover_host_cpu_softirq_spent_seconds_total
total time spent handling software interrupts
Seconds
groundcover_host_cpu_interrupt_spent_percent
percentage of CPU time spent handling interrupts
0-100
groundcover_host_cpu_guest_spent_seconds_total
total time spent running guest processes
Seconds
groundcover_host_cpu_guest_spent_percent
percentage of CPU time spent running guest processes
0-100
groundcover_host_cpu_guest_nice_spent_seconds_total
total time spent running niced guest processes
Seconds
groundcover_host_cpu_context_switches_total
total number of context switches in the current host
Number
groundcover_host_cpu_load_avg1
CPU load average over 1 minute
Number
groundcover_host_cpu_load_avg5
CPU load average over 5 minutes
Number
groundcover_host_cpu_load_avg15
CPU load average over 15 minutes
Number
groundcover_host_cpu_load_norm1
normalized CPU load over 1 minute
Number
groundcover_host_cpu_load_norm5
normalized CPU load over 5 minutes
Number
groundcover_host_cpu_load_norm15
normalized CPU load over 15 minutes
Number
Host Memory
Labels
clusterId env region host_name cloud_provider env_type
Metrics
groundcover_host_mem_capacity_bytes
memory capacity in the current host
Bytes
groundcover_host_mem_used_bytes
memory used in the current host
Bytes
groundcover_host_mem_used_percent
percentage of used memory in the current host
0-100
groundcover_host_mem_free_bytes
free memory in the current host
Bytes
groundcover_host_mem_available_bytes
available memory in the current host
Bytes
groundcover_host_mem_cached_bytes
cached memory in the current host
Bytes
groundcover_host_mem_buffers_bytes
buffer memory in the current host
Bytes
groundcover_host_mem_shared_bytes
shared memory in the current host
Bytes
groundcover_host_mem_slab_bytes
slab memory in the current host
Bytes
groundcover_host_mem_sreclaimable_bytes
reclaimable slab memory in the current host
Bytes
groundcover_host_mem_page_tables_bytes
page tables memory in the current host
Bytes
groundcover_host_mem_commit_limit_bytes
memory commit limit in the current host
Bytes
groundcover_host_mem_committed_as_bytes
committed address space memory in the current host
Bytes
groundcover_host_mem_swap_cached_bytes
cached swap memory in the current host
Bytes
groundcover_host_mem_swap_total_bytes
total swap memory in the current host
Bytes
groundcover_host_mem_swap_free_bytes
free swap memory in the current host
Bytes
groundcover_host_mem_swap_used_bytes
used swap memory in the current host
Bytes
groundcover_host_mem_swap_in_bytes_total
swap in bytes in the current host
Bytes
groundcover_host_mem_swap_out_bytes_total
swap out bytes in the current host
Bytes
Host Disk
Labels
clusterId env region host_name cloud_provider env_type Optional: device_name
Metrics
groundcover_host_disk_space_used_bytes
used disk space in the current host
Bytes
groundcover_host_disk_space_free_bytes
free disk space in the current host
Bytes
groundcover_host_disk_space_total_bytes
total disk space in the current host
Bytes
groundcover_host_disk_space_used_percent
percentage of used disk space in the current host
0-100
groundcover_host_disk_read_time_ms_total
total time spent reading from disk per device in the current host
Milliseconds
groundcover_host_disk_write_time_ms_total
total time spent writing to disk per device in the current host
Milliseconds
groundcover_host_disk_read_count_total
total number of disk reads per device in the current host
Number
groundcover_host_disk_write_count_total
total number of disk writes per device in the current host
Number
groundcover_host_disk_merged_read_count_total
total number of merged disk reads per device in the current host
Number
groundcover_host_disk_merged_write_count_total
total number of merged disk writes per device in the current host
Number
Host I/O
Labels
clusterId env region host_name cloud_provider env_type Optional: device_name
Metrics
groundcover_host_io_read_kb_per_sec
disk read throughput per device in the current host
KB/s
groundcover_host_io_write_kb_per_sec
disk write throughput per device in the current host
KB/s
groundcover_host_io_read_await_ms
average time for read requests to be served per device in the current host
Milliseconds
groundcover_host_io_write_await_ms
average time for write requests to be served per device in the current host
Milliseconds
groundcover_host_io_await_ms
average time for I/O requests to be served per device in the current host
Milliseconds
groundcover_host_io_avg_request_size
average I/O request size per device in the current host
Kilobytes
groundcover_host_io_service_time_ms
average service time for I/O requests per device in the current host
Milliseconds
groundcover_host_io_avg_queue_size_kb
average I/O queue size per device in the current host
Kilobytes
groundcover_host_io_utilization_percent
percentage of time the device was busy serving I/O requests in the current host
0-100
groundcover_host_io_block_in_total
total number of block in the current host
Number
groundcover_host_io_block_out_total
total number of block out in the current host
Number
Host Filesystem
Labels
clusterId env region host_name cloud_provider env_type device_name file_system mountpoint
Metrics
groundcover_host_fs_used_bytes
used filesystem space in the current host
Bytes
groundcover_host_fs_free_bytes
free filesystem space in the current host
Bytes
groundcover_host_fs_total_bytes
total filesystem space in the current host
Bytes
groundcover_host_fs_used_percent
percentage of used filesystem space in the current host
0-100
groundcover_host_fs_inodes_total
total inodes in the filesystem
Number
groundcover_host_fs_inodes_used
used inodes in the filesystem
Number
groundcover_host_fs_inodes_free
free inodes in the filesystem
Number
groundcover_host_fs_inodes_used_percent
percentage of used inodes in the filesystem
0-100
Host File Handles
Labels
clusterId env region host_name cloud_provider env_type
Metrics
groundcover_host_fs_file_handles_allocated
total number of file handles allocated in the current host
Number
groundcover_host_fs_file_handles_allocated_unused
number of allocated but unused file handles in the current host
Number
groundcover_host_fs_file_handles_in_use
number of file handles currently in use in the current host
Number
groundcover_host_fs_file_handles_used_percent
percentage of file handles in use in the current host
0-100
groundcover_host_fs_file_handles_max
maximum number of file handles available in the current host
Number
Host Network
Labels
clusterId env region host_name cloud_provider env_type device
Metrics
groundcover_host_net_receive_bytes_total
total bytes received on network interface
Bytes
groundcover_host_net_transmit_bytes_total
total bytes transmitted on network interface
Bytes
groundcover_host_net_receive_packets_total
total packets received on network interface
Number
groundcover_host_net_transmit_packets_total
total packets transmitted on network interface
Number
groundcover_host_net_receive_dropped_total
total number of received packets dropped on network interface
Number
groundcover_host_net_receive_errors_total
total number of receive errors on network interface
Number
groundcover_host_net_transmit_dropped_total
total number of transmitted packets dropped on network interface
Number
groundcover_host_net_transmit_errors_total
total number of transmit errors on network interface
Number
Application Metrics & Labels
clusterId
Name identifier of the K8s cluster
All
region
Cloud provider region name
All
namespace
K8s namespace
All
workload_name
K8s workload (or service) name
All
pod_name
K8s pod name
All
container_name
K8s container name
All
container_image
K8s container image name
All
remote_namespace
Remote K8s namespace (other side of the communication)
All
remote_service_name
Remote K8s service name (other side of the communication)
All
remote_container_name
Remote K8s container name (other side of the communication)
All
type
The protocol in use (HTTP, gRPC, Kafka, DNS etc.)
All
sub_type
The sub type of the protocol (GET, POST, etc)
All
role
Role in the communication (client or server)
All
clustered_resource_name
The clustered name of the resource, depends on the protocol
All
status_code
"ok", "error" or "unset"
All
server
The server workload/name
All
client
The client workload/name
All
server_namesapce
The server namespace
All
client_namespace
The client namespace
All
server_is_external
Indicate whether the server is external
All
client_is_external
Indicate wheter the client is external
All
is_encrypted
Indicate whether the communication is encrypted
All
is_cross_az
Indicate wether the communication is cross availability zone
All
clustered_path
HTTP / gRPC aggregated resource path (e.g. /metrics/*)
http, grpc
method
HTTP / gRPC method (e.g GET)
http, grpc
response_status_code
Return status code of a HTTP / gPRC request (e.g. 200 in HTTP)
http, grpc
dialect
SQL dialect (MySQL or PostgreSQL)
mysql, postgresql
response_status
Return status code of a SQL query (e.g 42P01 for undefined table)
mysql, postgresql
client_type
Kafka client type (Fetcher / Producer)
kafka
topic
Kafka topic name
kafka
partition
Kafka partition identifier
kafka
error_code
Kafka return status code
kafka
query_type
type of DNS query (e.g. AAAA)
dns
response_return_code
Return status code of a DNS resolution request (e.g. Name Error)
dns
exit_code
K8s container termination exit code
container_state, container_crash
state
K8s container current state (Running, Waiting or Terminated)
container_state
state_reason
K8s container state transition reason (e.g CrashLoopBackOff or OOMKilled)
container_state
crash_reason
K8s container crash reason (e.g Error, OOMKilled)
container_crash
pvc_name
K8s PVC name
storage
We also use a set of internal labels which are not relevant in most use-cases. Find them interesting? Let us know over Slack!
issue_id entity_id resource_id query_id aggregation_id parent_entity_id perspective_entity_id perspective_entity_is_external perspective_entity_issue_id perspective_entity_name perspective_entity_namespace perspective_entity_resource_id
Golden Signals (Errors & Issues)
In the lists below, we describe error and issue counters. Every issue flagged by the platform is an error; but not every error is flagged as an issue.
Resource metrics
groundcover_resource_total_counter
total amount of resource requests
Number
groundcover_resource_error_counter
total amount of requests with error status codes
Number
groundcover_resource_issue_counter
total amount of requests which were flagged as issues
Number
groundcover_resource_success_counter
total amount of resource requests with OK status codes
Number
groundcover_resource_latency_seconds
resource latency
Seconds
Workload metrics
groundcover_workload_total_counter
total amount of requests handled by the workload
Number
groundcover_workload_error_counter
total amount of requests handled by the workload with error status codes
Number
groundcover_workload_issue_counter
total amount of requests handled by the workload which were flagged as issues
Number
groundcover_workload_success_counter
total amount of requests handled by the workload with OK status codes
Number
groundcover_workload_latency_seconds
resource latency across all of the workload APIs
Seconds
Kafka specific metrics
groundcover_client_offset
client last message offset (for producer the last offset produced, for consumer the last requested offset)
groundcover_workload_client_offset
client last message offset (for producer the last offset produced, for consumer the last requested offset), aggregated by workload
groundcover_calc_lagged_messages
current lag in messages
Number
groundcover_workload_calc_lagged_messages
current lag in messages, aggregated by workload
Number
groundcover_calc_lag_seconds
current lag in time
Seconds
groundcover_workload_calc_lag_seconds
current lag in time, aggregated by workload
Seconds
Last updated
