Metrics & Labels

Kubernetes Infrastructure Metrics & Labels

Node CPU, Memory and Disk

Labels

type clusterId region node_name

Metrics

Name
Description
Unit
Type

groundcover_node_allocatable_cpum_cpu

Allocatable CPU in the current node

mCPU

Gauge

groundcover_node_allocatable_mem_bytes

Allocatable memory in the current node

Bytes

Gauge

groundcover_node_mem_used_percent

Percentage of used memory in the current node

Percentage

Gauge

groundcover_node_used_disk_space

Current used disk space in the current node

Bytes

Gauge

groundcover_node_free_disk_space

Free disk space in the current node

Bytes

Gauge

groundcover_node_total_disk_space

Total disk space in the current node

Bytes

Gauge

groundcover_node_used_percent_disk_space

Percentage of used disk space in the current node

Percentage

Gauge

Storage Usage

Labels

type clusterId region name namespace

Metrics

Name
Description
Unit
Type

groundcover_pvc_usage_bytes

Persistent Volume Claim (PVC) usage

Bytes

Gauge

groundcover_pvc_capacity_bytes

Persistent Volume Claim (PVC) capacity

Bytes

Gauge

groundcover_pvc_available_bytes

Available Persistent Volume Claim (PVC) space

Bytes

Gauge

groundcover_pvc_usage_percent

Percentage of used Persistent Volume Claim (PVC) storage

Percentage

Gauge

groundcover_pvc_read_bytes_total

Total bytes read by the workload from the Persistent Volume Claim (PVC)

Bytes

Counter

groundcover_pvc_write_bytes_total

Total bytes written by the workload to the Persistent Volume Claim (PVC)

Bytes

Counter

groundcover_pvc_reads_total

Total read operations performed by the workload from the Persistent Volume Claim (PVC)

Number

Counter

groundcover_pvc_writes_total

Total write operations performed by the workload to the Persistent Volume Claim (PVC)

Number

Counter

groundcover_pvc_read_latency

Latency of read operations from the Persistent Volume Claim (PVC) by the workload

Seconds

Summary

groundcover_pvc_write_latency

Latency of write operations to the Persistent Volume Claim (PVC) by the workload

Seconds

Summary

groundcover_pvc_read_latency_count

Count of read operations latency for the Persistent Volume Claim (PVC)

Number

Counter

groundcover_pvc_read_latency_sum

Sum of read operation latencies for the Persistent Volume Claim (PVC)

Seconds

Counter

groundcover_pvc_read_latency_summary

Summary of read operations latency for the Persistent Volume Claim (PVC)

Milliseconds

Counter

groundcover_pvc_write_latency_count

Count of write operations sampled for latency on the Persistent Volume Claim (PVC)

Number

Counter

groundcover_pvc_write_latency_sum

Sum of write operation latencies for the Persistent Volume Claim (PVC)

Seconds

Counter

groundcover_pvc_write_latency_summary

Summary of write operations latency for the Persistent Volume Claim (PVC)

Milliseconds

Counter

Network Usage

Labels

clusterId workload_name namespace container_name remote_service_name remote_namespace remote_is_external availability_zone region remote_availability_zone remote_region is_cross_az protocol role server_port encryption transport_protocol is_loopback

Notes:

  • is_loopback and remote_is_external are special labels that indicate the remote service is either the same service as the recording side (loopback) or resides in an external network, e.g managed service outside of the cluster (external).

    • In both cases the remote_service_name and the remote_namespace labels will be empty

  • is_cross_az means the traffic was sent and/or received between two different availability zones. This is a helpful flag to quickly identify this special kind of communication.

    • The actual zones are detailed in the availability_zone and remote_availability_zone labels

Metrics

Name
Description
Unit
Type

groundcover_network_rx_bytes_total

Total bytes received by the workload

Bytes

Counter

groundcover_network_tx_bytes_total

Total bytes sent by the workload

Bytes

Counter

groundcover_network_connections_opened_total

Total connections opened by the workload

Number

Counter

groundcover_network_connections_closed_total

Total connections closed by the workload

Number

Counter

groundcover_network_connections_opened_failed_total

Total number of failed network connection attempts by the workload

Number

Counter

groundcover_network_connections_refused_failed_total

Connections attempts refused per workload

Number

Counter

groundcover_network_connections_opened_refused_total

Total number of network connections refused by the workload

Number

Counter

groundcover_network_rx_ops_total

Total number of read operations issued by the workload

Number

Counter

groundcover_network_tx_ops_total

Total number of write operations issued by the workload

Number

Counter

Kubernetes Resources

Labels

type resource condition status clusterId region namespace workload_name deployment unit

Metrics

Name
Description
Unit
Type

groundcover_kube_cronjob_status_active

Number of active CronJob executions

Number

Gauge

groundcover_kube_daemonset_status_current_number_scheduled

Number of Pods currently scheduled by the DaemonSet

Number

Gauge

groundcover_kube_daemonset_status_desired_number_scheduled

Desired number of Pods scheduled by the DaemonSet

Number

Gauge

groundcover_kube_daemonset_status_number_available

Number of available Pods for the DaemonSet

Number

Gauge

groundcover_kube_daemonset_status_number_misscheduled

Number of Pods running on nodes they should not be scheduled on

Number

Gauge

groundcover_kube_daemonset_status_number_ready

Number of ready Pods for the DaemonSet

Number

Gauge

groundcover_kube_daemonset_status_number_unavailable

Number of unavailable Pods for the DaemonSet

Number

Gauge

groundcover_kube_daemonset_status_observed_generation

Most recent generation observed for the DaemonSet

Number

Gauge

groundcover_kube_daemonset_status_updated_number_scheduled

Number of Pods updated and scheduled by the DaemonSet

Number

Gauge

groundcover_kube_deployment_created

Creation timestamp of the Deployment

Seconds

Gauge

groundcover_kube_deployment_metadata_generation

Sequence number representing a specific generation of the Deployment

Number

Gauge

groundcover_kube_deployment_spec_paused

Whether the Deployment is paused

Number

Gauge

groundcover_kube_deployment_spec_replicas

Desired number of replicas for the Deployment

Number

Gauge

groundcover_kube_deployment_spec_strategy_rollingupdate_max_unavailable

Maximum number of unavailable Pods during a rolling update for the Deployment

Number

Gauge

groundcover_kube_deployment_status_condition

Current condition of the Deployment (labeled by type and status)

Number

Gauge

groundcover_kube_deployment_status_observed_generation

Most recent generation observed for the Deployment

Number

Gauge

groundcover_kube_deployment_status_replicas

Number of replicas for the Deployment

Number

Gauge

groundcover_kube_deployment_status_replicas_available

Number of available replicas for the Deployment

Number

Gauge

groundcover_kube_deployment_status_replicas_ready

Number of ready replicas for the Deployment

Number

Gauge

groundcover_kube_deployment_status_replicas_unavailable

Number of unavailable replicas for the Deployment

Number

Gauge

groundcover_kube_deployment_status_replicas_updated

Number of updated replicas for the Deployment

Number

Gauge

groundcover_kube_horizontalpodautoscaler_spec_max_replicas

Maximum number of replicas configured for the HPA

Number

Gauge

groundcover_kube_horizontalpodautoscaler_spec_min_replicas

Minimum number of replicas configured for the HPA

Number

Gauge

groundcover_kube_horizontalpodautoscaler_spec_target_metric

Configured HPA target metric value

Number

Gauge

groundcover_kube_horizontalpodautoscaler_status_condition

Current condition of the Horizontal Pod Autoscaler (labeled by type and status)

Number

Gauge

groundcover_kube_horizontalpodautoscaler_status_current_replicas

Current number of replicas managed by the HPA

Number

Gauge

groundcover_kube_horizontalpodautoscaler_status_desired_replicas

Desired number of replicas as calculated by the HPA

Number

Gauge

groundcover_kube_horizontalpodautoscaler_status_target_metric

Current observed value of the HPA target metric

Number

Gauge

groundcover_kube_job_complete

Whether the Job has completed successfully

Number

Gauge

groundcover_kube_job_failed

Whether the Job has failed

Number

Gauge

groundcover_kube_job_spec_completions

Desired number of successfully finished Pods for the Job

Number

Gauge

groundcover_kube_job_spec_parallelism

Desired number of Pods running in parallel for the Job

Number

Gauge

groundcover_kube_job_status_active

Number of actively running Pods for the Job

Number

Gauge

groundcover_kube_job_status_completion_time

Completion time of the Job as Unix timestamp

Seconds

Gauge

groundcover_kube_job_status_failed

Number of failed Pods for the Job

Number

Gauge

groundcover_kube_job_status_start_time

Start time of the Job as Unix timestamp

Seconds

Gauge

groundcover_kube_job_status_succeeded

Number of succeeded Pods for the Job

Number

Gauge

groundcover_kube_node_created

Creation timestamp of the Node

Seconds

Gauge

groundcover_kube_node_spec_taint

Node taint information (labeled by key, value and effect)

Number

Gauge

groundcover_kube_node_spec_unschedulable

Whether a node can schedule new pods

Number

Gauge

groundcover_kube_node_status_allocatable

The amount of resources allocatable for pods (after reserving some for system daemons)

Number

Gauge

groundcover_kube_node_status_capacity

The total amount of resources available for a node

Number

Gauge

groundcover_kube_node_status_condition

The condition of a cluster node

Number

Gauge

groundcover_kube_persistentvolume_capacity_bytes

Capacity of the PersistentVolume

Bytes

Gauge

groundcover_kube_persistentvolume_status_phase

Current phase of the PersistentVolume

Number

Gauge

groundcover_kube_persistentvolumeclaim_access_mode

Access mode of the PersistentVolumeClaim

Number

Gauge

groundcover_kube_persistentvolumeclaim_status_phase

Current phase of the PersistentVolumeClaim

Number

Gauge

groundcover_kube_pod_container_resource_limits

The number of requested limit resource by a container. It is recommended to use the `kube_pod_resource_limits` metric exposed by kube-scheduler instead, as it is more precise.

Number

Gauge

groundcover_kube_pod_container_resource_requests

The number of requested request resource by a container. It is recommended to use the `kube_pod_resource_requests` metric exposed by kube-scheduler instead, as it is more precise.

Number

Gauge

groundcover_kube_pod_container_status_last_terminated_exitcode

The last termination exit code for the container

Number

Gauge

groundcover_kube_pod_container_status_last_terminated_reason

The last termination reason for the container

Number

Gauge

groundcover_kube_pod_container_status_ready

Describes whether the containers readiness check succeeded

Number

Gauge

groundcover_kube_pod_container_status_restarts_total

The number of container restarts per container

Number

Counter

groundcover_kube_pod_container_status_running

Describes whether the container is currently in running state

Number

Gauge

groundcover_kube_pod_container_status_terminated

Describes whether the container is currently in terminated state

Number

Gauge

groundcover_kube_pod_container_status_terminated_reason

Describes the reason the container is currently in terminated state

Number

Gauge

groundcover_kube_pod_container_status_waiting

Describes whether the container is currently in waiting state

Number

Gauge

groundcover_kube_pod_container_status_waiting_reason

Describes the reason the container is currently in waiting state

Number

Gauge

groundcover_kube_pod_created

Creation timestamp of the Pod

Seconds

Gauge

groundcover_kube_pod_init_container_resource_limits

The number of CPU cores requested limit by an init container

Bytes

Gauge

groundcover_kube_pod_init_container_resource_requests

Requested resources by init container (labeled by resource and unit)

Number

Gauge

groundcover_kube_pod_init_container_resource_requests_memory_bytes

Requested memory by init containers

Bytes

Gauge

groundcover_kube_pod_init_container_status_last_terminated_reason

The last termination reason for the init container

Number

Gauge

groundcover_kube_pod_init_container_status_ready

Describes whether the init containers readiness check succeeded

Number

Gauge

groundcover_kube_pod_init_container_status_restarts_total

The number of restarts for the init container

Number

Gauge

groundcover_kube_pod_init_container_status_running

Describes whether the init container is currently in running state

Number

Gauge

groundcover_kube_pod_init_container_status_terminated

Describes whether the init container is currently in terminated state

Number

Gauge

groundcover_kube_pod_init_container_status_terminated_reason

Describes the reason the init container is currently in terminated state

Number

Gauge

groundcover_kube_pod_init_container_status_waiting

Describes whether the init container is currently in waiting state

Number

Gauge

groundcover_kube_pod_init_container_status_waiting_reason

Describes the reason the init container is currently in waiting state

Number

Gauge

groundcover_kube_pod_spec_volumes_persistentvolumeclaims_readonly

Whether the PersistentVolumeClaim is mounted as read-only in the Pod

Number

Gauge

groundcover_kube_pod_status_phase

The pods current phase

Number

Gauge

groundcover_kube_pod_status_ready

Describes whether the pod is ready to serve requests

Number

Gauge

groundcover_kube_pod_status_scheduled

Describes the status of the scheduling process for the pod

Number

Gauge

groundcover_kube_pod_status_unschedulable

Whether the Pod is unschedulable

Number

Gauge

groundcover_kube_pod_tolerations

Pod tolerations configuration

Number

Gauge

groundcover_kube_replicaset_spec_replicas

Desired number of replicas for the ReplicaSet

Number

Gauge

groundcover_kube_replicaset_status_fully_labeled_replicas

Number of fully labeled replicas for the ReplicaSet

Number

Gauge

groundcover_kube_replicaset_status_observed_generation

Most recent generation observed for the ReplicaSet

Number

Gauge

groundcover_kube_replicaset_status_ready_replicas

Number of ready replicas for the ReplicaSet

Number

Gauge

groundcover_kube_replicaset_status_replicas

Number of replicas for the ReplicaSet

Number

Gauge

groundcover_kube_resourcequota

Resource quota information (labeled by resource and type: hard/used)

Number

Gauge

groundcover_kube_resourcequota_created

Creation timestamp of the ResourceQuota as Unix seconds

Seconds

Gauge

groundcover_kube_statefulset_metadata_generation

Sequence number representing a specific generation of the StatefulSet

Number

Gauge

groundcover_kube_statefulset_replicas

Desired number of replicas for the StatefulSet

Number

Gauge

groundcover_kube_statefulset_status_current_revision

Current revision of the StatefulSet

Number

Gauge

groundcover_kube_statefulset_status_observed_generation

Most recent generation observed for the StatefulSet

Number

Gauge

groundcover_kube_statefulset_status_replicas

Number of replicas for the StatefulSet

Number

Gauge

groundcover_kube_statefulset_status_replicas_available

Number of available replicas for the StatefulSet

Number

Gauge

groundcover_kube_statefulset_status_replicas_current

Number of current replicas for the StatefulSet

Number

Gauge

groundcover_kube_statefulset_status_replicas_ready

Number of ready replicas for the StatefulSet

Number

Gauge

groundcover_kube_statefulset_status_replicas_updated

Number of updated replicas for the StatefulSet

Number

Gauge

groundcover_kube_statefulset_status_update_revision

Update revision of the StatefulSet

Number

Gauge

groundcover_kube_job_duration

Time elapsed between the start and completion time of the Job, or current time if the Job is still running

Seconds

Gauge

groundcover_kube_pod_uptime

Time elapsed since the Pod was created

Seconds

Gauge

Container Metrics & Labels

Container CPU

Labels

type clusterId region namespace node_name workload_name pod_name container_name container_image

Metrics

Name
Description
Unit
Type

groundcover_container_cpu_usage_rate_millis

CPU usage rate

mCPU

Gauge

groundcover_container_cpu_cfs_periods_total

Total number of elapsed CPU CFS scheduler enforcement periods for the container

Number

Counter

groundcover_container_cpu_delay_seconds

K8s container CPU delay

Seconds

Counter

groundcover_container_cpu_request_usage_percent

CPU usage rate out of request (usage/request)

Percentage

Gauge

groundcover_container_cpu_throttled_percent

Percentage of CPU throttling for the container

Percentage

Gauge

groundcover_container_cpu_throttled_periods

Total number of throttled CPU periods for the container

Number

Counter

groundcover_container_cpu_throttled_rate_millis

Rate of CPU throttling for the container

mCPU

Gauge

groundcover_container_cpu_throttled_seconds_total

Total CPU throttling time for K8s container

Seconds

Counter

groundcover_container_cpu_usage_percent

CPU usage rate (usage/limit)

Percentage

Gauge

groundcover_container_m_cpu_usage_seconds_total

Total CPU usage time in milli-CPUs for the container

mCPU

Counter

groundcover_container_m_cpu_usage_system_seconds_total

Total CPU time spent in system mode for the container

Seconds

Counter

groundcover_container_m_cpu_usage_user_seconds_total

Total CPU time spent in user mode for the container

Seconds

Counter

groundcover_container_cpu_limit_m_cpu

K8s container CPU limit

mCPU

Gauge

groundcover_container_cpu_request_m_cpu

K8s container requested CPU allocation

mCPU

Gauge

groundcover_container_cpu_pressure_full_avg10

Average percentage of time all non-idle tasks were stalled on CPU over 10 seconds

Percentage

Gauge

groundcover_container_cpu_pressure_full_avg300

Average percentage of time all non-idle tasks were stalled on CPU over 300 seconds

Percentage

Gauge

groundcover_container_cpu_pressure_full_avg60

Average percentage of time all non-idle tasks were stalled on CPU over 60 seconds

Percentage

Gauge

groundcover_container_cpu_pressure_full_total

Total time all non-idle tasks were stalled waiting for CPU

Microseconds

Counter

groundcover_container_cpu_pressure_some_avg10

Average percentage of time at least some tasks were stalled on CPU over 10 seconds

Percentage

Gauge

groundcover_container_cpu_pressure_some_avg300

Average percentage of time at least some tasks were stalled on CPU over 300 seconds

Percentage

Gauge

groundcover_container_cpu_pressure_some_avg60

Average percentage of time at least some tasks were stalled on CPU over 60 seconds

Percentage

Gauge

groundcover_container_cpu_pressure_some_total

Total time at least some tasks were stalled waiting for CPU

Microseconds

Counter

Container Memory

Labels

type clusterId region namespace node_name workload_name pod_name container_name container_image

Metrics

Name
Description
Unit
Type

groundcover_container_memory_working_set_bytes

Current memory working set

Bytes

Gauge

groundcover_container_mem_working_set_bytes

Working set memory usage for the container

Bytes

Gauge

groundcover_container_memory_cache_usage_bytes

Memory cache usage for the container

Bytes

Gauge

groundcover_container_memory_kernel_usage_bytes

Kernel memory usage for the container

Bytes

Gauge

groundcover_container_memory_limit_bytes

K8s container memory limit

Bytes

Gauge

groundcover_container_memory_major_page_faults

Total number of major page faults for the container

Number

Counter

groundcover_container_memory_oom_events

Total number of out-of-memory (OOM) events for the container

Number

Counter

groundcover_container_memory_page_faults

Total number of page faults for the container

Number

Counter

groundcover_container_memory_request_bytes

K8s container requested memory allocation

Bytes

Gauge

groundcover_container_memory_request_used_percent

Memory usage rate out of request (usage/request)

Percentage

Gauge

groundcover_container_memory_rss_bytes

Current memory resident set size (RSS)

Bytes

Gauge

groundcover_container_memory_swap_usage_bytes

Swap memory usage for the container

Bytes

Gauge

groundcover_container_memory_usage_bytes

Current memory usage for the container

Bytes

Gauge

groundcover_container_memory_usage_peak_bytes

Peak memory usage for the container

Bytes

Gauge

groundcover_container_memory_used_percent

Memory usage rate (usage/limit)

Percentage

Gauge

groundcover_container_memory_pressure_full_avg10

Average percentage of time all non-idle tasks were stalled on memory over 10 seconds

Percentage

Gauge

groundcover_container_memory_pressure_full_avg300

Average percentage of time all non-idle tasks were stalled on memory over 300 seconds

Percentage

Gauge

groundcover_container_memory_pressure_full_avg60

Average percentage of time all non-idle tasks were stalled on memory over 60 seconds

Percentage

Gauge

groundcover_container_memory_pressure_full_total

Total time all non-idle tasks were stalled waiting for memory

Microseconds

Counter

groundcover_container_memory_pressure_some_avg10

Average percentage of time at least some tasks were stalled on memory over 10 seconds

Percentage

Gauge

groundcover_container_memory_pressure_some_avg300

Average percentage of time at least some tasks were stalled on memory over 300 seconds

Percentage

Gauge

groundcover_container_memory_pressure_some_avg60

Average percentage of time at least some tasks were stalled on memory over 60 seconds

Percentage

Gauge

groundcover_container_memory_pressure_some_total

Total time at least some tasks were stalled waiting for memory

Microseconds

Counter

Container I/O

Labels

type clusterId region namespace node_name workload_name pod_name container_name container_image

Metrics

Name
Description
Unit
Type

groundcover_container_io_read_bytes_total

Total bytes read by the container

Bytes

Counter

groundcover_container_io_read_ops_total

Total number of read operations by the container

Number

Counter

groundcover_container_io_write_bytes_total

Total bytes written by the container

Bytes

Counter

groundcover_container_io_write_ops_total

Total number of write operations by the container

Number

Counter

groundcover_container_disk_delay_seconds

K8s container disk I/O delay

Seconds

Counter

groundcover_container_io_pressure_full_avg10

Average percentage of time all non-idle tasks were stalled on I/O over 10 seconds

Percentage

Gauge

groundcover_container_io_pressure_full_avg300

Average percentage of time all non-idle tasks were stalled on I/O over 300 seconds

Percentage

Gauge

groundcover_container_io_pressure_full_avg60

Average percentage of time all non-idle tasks were stalled on I/O over 60 seconds

Percentage

Gauge

groundcover_container_io_pressure_full_total

Total time all non-idle tasks were stalled waiting for I/O

Microseconds

Counter

groundcover_container_io_pressure_some_avg10

Average percentage of time at least some tasks were stalled on I/O over 10 seconds

Percentage

Gauge

groundcover_container_io_pressure_some_avg300

Average percentage of time at least some tasks were stalled on I/O over 300 seconds

Percentage

Gauge

groundcover_container_io_pressure_some_avg60

Average percentage of time at least some tasks were stalled on I/O over 60 seconds

Percentage

Gauge

groundcover_container_io_pressure_some_total

Total time at least some tasks were stalled waiting for I/O

Microseconds

Counter

Container Network

Labels

type clusterId region namespace node_name workload_name pod_name container_name container_image

Metrics

Name
Description
Unit
Type

groundcover_container_network_rx_bytes_total

Total bytes received by the container

Bytes

Counter

groundcover_container_network_rx_dropped_total

Total number of received packets dropped by the container

Number

Counter

groundcover_container_network_rx_errors_total

Total number of errors encountered while receiving packets

Number

Counter

groundcover_container_network_tx_bytes_total

Total bytes transmitted by the container

Bytes

Counter

groundcover_container_network_tx_dropped_total

Total number of transmitted packets dropped by the container

Number

Counter

groundcover_container_network_tx_errors_total

Total number of errors encountered while transmitting packets

Number

Counter

Container Status

Labels

type clusterId region namespace node_name workload_name pod_name container_name container_image

Metrics

Name
Description
Unit
Type

groundcover_container_uptime_seconds

Uptime of the container

Seconds

Gauge

groundcover_container_crash_count

Total count of container crashes

Number

Counter

Host Resources Metrics & Labels

Host CPU

Labels

clusterId env region host_name cloud_provider env_type

Metrics

Name
Description
Unit
Type

groundcover_host_uptime_seconds

Uptime of the current host

Seconds

Gauge

groundcover_host_cpu_capacity_m_cpu

CPU capacity in the current host

mCPU

Gauge

groundcover_host_cpu_usage_m_cpu

Cpu usage in the current host

mCPU

Gauge

groundcover_host_cpu_usage_percent

Percentage of used cpu in the current host

Percentage

Gauge

groundcover_host_cpu_num_cores

Number of CPU cores on the host

Number

Gauge

groundcover_host_cpu_user_spent_seconds_total

Total time spent in user mode

Seconds

Counter

groundcover_host_cpu_user_spent_percent

Percentage of CPU time spent in user mode

Percentage

Gauge

groundcover_host_cpu_system_spent_seconds_total

Total time spent in system mode

Seconds

Counter

groundcover_host_cpu_system_spent_percent

Percentage of CPU time spent in system mode

Percentage

Gauge

groundcover_host_cpu_idle_spent_seconds_total

Total time spent idle

Seconds

Counter

groundcover_host_cpu_idle_spent_percent

Percentage of CPU time spent idle

Percentage

Gauge

groundcover_host_cpu_iowait_spent_seconds_total

Total time spent waiting for I/O to complete

Seconds

Counter

groundcover_host_cpu_iowait_spent_percent

Percentage of CPU time spent waiting for I/O

Percentage

Gauge

groundcover_host_cpu_nice_spent_seconds_total

Total time spent on niced processes

Seconds

Counter

groundcover_host_cpu_steal_spent_seconds_total

Total time spent in involuntary wait (stolen by hypervisor)

Seconds

Counter

groundcover_host_cpu_stolen_spent_percent

Percentage of CPU time stolen by the hypervisor

Percentage

Gauge

groundcover_host_cpu_irq_spent_seconds_total

Total time spent handling hardware interrupts

Seconds

Counter

groundcover_host_cpu_softirq_spent_seconds_total

Total time spent handling software interrupts

Seconds

Counter

groundcover_host_cpu_interrupt_spent_percent

Percentage of CPU time spent handling interrupts

Percentage

Gauge

groundcover_host_cpu_guest_spent_seconds_total

Total time spent running guest processes

Seconds

Counter

groundcover_host_cpu_guest_spent_percent

Percentage of CPU time spent running guest processes

Percentage

Gauge

groundcover_host_cpu_guest_nice_spent_seconds_total

Total time spent running niced guest processes

Seconds

Counter

groundcover_host_cpu_context_switches_total

Total number of context switches in the current host

Number

Counter

groundcover_host_cpu_load_avg1

CPU load average over 1 minute

Number

Gauge

groundcover_host_cpu_load_avg5

CPU load average over 5 minutes

Number

Gauge

groundcover_host_cpu_load_avg15

CPU load average over 15 minutes

Number

Gauge

groundcover_host_cpu_load_norm1

Normalized CPU load over 1 minute

Number

Gauge

groundcover_host_cpu_load_norm5

Normalized CPU load over 5 minutes

Number

Gauge

groundcover_host_cpu_load_norm15

Normalized CPU load over 15 minutes

Number

Gauge

Host Memory

Labels

clusterId env region host_name cloud_provider env_type

Metrics

Name
Description
Unit
Type

groundcover_host_mem_capacity_bytes

Memory capacity in the current host

Bytes

Gauge

groundcover_host_mem_used_bytes

Memory used in the current host

Bytes

Gauge

groundcover_host_mem_used_percent

Percentage of used memory in the current host

Percentage

Gauge

groundcover_host_mem_free_bytes

Free memory in the current host

Bytes

Gauge

groundcover_host_mem_available_bytes

Available memory in the current host

Bytes

Gauge

groundcover_host_mem_cached_bytes

Cached memory in the current host

Bytes

Gauge

groundcover_host_mem_buffers_bytes

Buffer memory in the current host

Bytes

Gauge

groundcover_host_mem_shared_bytes

Shared memory in the current host

Bytes

Gauge

groundcover_host_mem_slab_bytes

Slab memory in the current host

Bytes

Gauge

groundcover_host_mem_sreclaimable_bytes

Reclaimable slab memory in the current host

Bytes

Gauge

groundcover_host_mem_page_tables_bytes

Page tables memory in the current host

Bytes

Gauge

groundcover_host_mem_commit_limit_bytes

Memory commit limit in the current host

Bytes

Gauge

groundcover_host_mem_committed_as_bytes

Committed address space memory in the current host

Bytes

Gauge

groundcover_host_mem_swap_cached_bytes

Cached swap memory in the current host

Bytes

Gauge

groundcover_host_mem_swap_total_bytes

Total swap memory in the current host

Bytes

Gauge

groundcover_host_mem_swap_free_bytes

Free swap memory in the current host

Bytes

Gauge

groundcover_host_mem_swap_used_bytes

Used swap memory in the current host

Bytes

Gauge

groundcover_host_mem_swap_in_bytes_total

Swap in bytes in the current host

Bytes

Counter

groundcover_host_mem_swap_out_bytes_total

Swap out bytes in the current host

Bytes

Counter

groundcover_host_mem_swap_free_percent

Percentage of free swap memory in the current host

Percentage

Gauge

groundcover_host_mem_usable_percent

Percentage of usable (available) memory in the current host

Percentage

Gauge

Host Disk

Labels

clusterId env region host_name cloud_provider env_type Optional: device_name

Metrics

Name
Description
Unit
Type

groundcover_host_disk_space_used_bytes

Used disk space in the current host

Bytes

Gauge

groundcover_host_disk_space_free_bytes

Free disk space in the current host

Bytes

Gauge

groundcover_host_disk_space_total_bytes

Total disk space in the current host

Bytes

Gauge

groundcover_host_disk_space_used_percent

Percentage of used disk space in the current host

Percentage

Gauge

groundcover_host_disk_read_time_ms_total

Total time spent reading from disk per device in the current host

Milliseconds

Counter

groundcover_host_disk_write_time_ms_total

Total time spent writing to disk per device in the current host

Milliseconds

Counter

groundcover_host_disk_read_count_total

Total number of disk reads per device in the current host

Number

Counter

groundcover_host_disk_write_count_total

Total number of disk writes per device in the current host

Number

Counter

groundcover_host_disk_merged_read_count_total

Total number of merged disk reads per device in the current host

Number

Counter

groundcover_host_disk_merged_write_count_total

Total number of merged disk writes per device in the current host

Number

Counter

Host I/O

Labels

clusterId env region host_name cloud_provider env_type Optional: device_name

Metrics

Name
Description
Unit
Type

groundcover_host_io_read_kb_per_sec

Disk read throughput per device in the current host

Kilobytes per second

Gauge

groundcover_host_io_write_kb_per_sec

Disk write throughput per device in the current host

Kilobytes per second

Gauge

groundcover_host_io_read_await_ms

Average time for read requests to be served per device in the current host

Milliseconds

Gauge

groundcover_host_io_write_await_ms

Average time for write requests to be served per device in the current host

Milliseconds

Gauge

groundcover_host_io_await_ms

Average time for I/O requests to be served per device in the current host

Milliseconds

Gauge

groundcover_host_io_avg_request_size

Average I/O request size per device in the current host

Kilobytes

Gauge

groundcover_host_io_service_time_ms

Average service time for I/O requests per device in the current host

Milliseconds

Gauge

groundcover_host_io_avg_queue_size_kb

Average I/O queue size per device in the current host

Kilobytes

Gauge

groundcover_host_io_utilization_percent

Percentage of time the device was busy serving I/O requests in the current host

Percentage

Gauge

groundcover_host_io_block_in_total

Total number of block in the current host

Number

Counter

groundcover_host_io_block_out_total

Total number of block out in the current host

Number

Counter

Host Filesystem

Labels

clusterId env region host_name cloud_provider env_type device_name file_system mountpoint

Metrics

Name
Description
Unit
Type

groundcover_host_fs_used_bytes

Used filesystem space in the current host

Bytes

Gauge

groundcover_host_fs_free_bytes

Free filesystem space in the current host

Bytes

Gauge

groundcover_host_fs_total_bytes

Total filesystem space in the current host

Bytes

Gauge

groundcover_host_fs_used_percent

Percentage of used filesystem space in the current host

Percentage

Gauge

groundcover_host_fs_inodes_total

Total inodes in the filesystem

Number

Gauge

groundcover_host_fs_inodes_used

Used inodes in the filesystem

Number

Gauge

groundcover_host_fs_inodes_free

Free inodes in the filesystem

Number

Gauge

groundcover_host_fs_inodes_used_percent

Percentage of used inodes in the filesystem

Percentage

Gauge

groundcover_host_fs_file_handles_allocated

Total number of file handles allocated in the current host

Number

Gauge

groundcover_host_fs_file_handles_allocated_unused

Number of allocated but unused file handles in the current host

Number

Gauge

groundcover_host_fs_file_handles_in_use

Number of file handles currently in use in the current host

Number

Gauge

groundcover_host_fs_file_handles_max

Maximum number of file handles available in the current host

Number

Gauge

groundcover_host_fs_file_handles_used_percent

Percentage of file handles in use in the current host

Percentage

Gauge

Host File Handles

Labels

clusterId env region host_name cloud_provider env_type

Metrics

Name
Description
Unit
Type

groundcover_host_fs_file_handles_allocated

Total number of file handles allocated in the current host

Number

Gauge

groundcover_host_fs_file_handles_allocated_unused

Number of allocated but unused file handles in the current host

Number

Gauge

groundcover_host_fs_file_handles_in_use

Number of file handles currently in use in the current host

Number

Gauge

groundcover_host_fs_file_handles_used_percent

Percentage of file handles in use in the current host

Percentage

Gauge

groundcover_host_fs_file_handles_max

Maximum number of file handles available in the current host

Number

Gauge

Host Network

Labels

clusterId env region host_name cloud_provider env_type device

Metrics

Name
Description
Unit
Type

groundcover_host_net_receive_bytes_total

Total bytes received on network interface

Bytes

Counter

groundcover_host_net_transmit_bytes_total

Total bytes transmitted on network interface

Bytes

Counter

groundcover_host_net_receive_packets_total

Total packets received on network interface

Number

Counter

groundcover_host_net_transmit_packets_total

Total packets transmitted on network interface

Number

Counter

groundcover_host_net_receive_dropped_total

Total number of received packets dropped on network interface

Number

Counter

groundcover_host_net_receive_errors_total

Total number of receive errors on network interface

Number

Counter

groundcover_host_net_transmit_dropped_total

Total number of transmitted packets dropped on network interface

Number

Counter

groundcover_host_net_transmit_errors_total

Total number of transmit errors on network interface

Number

Counter

Application Metrics & Labels

Label name
Description
Relevant types

clusterId

Name identifier of the K8s cluster

All

region

Cloud provider region name

All

namespace

K8s namespace

All

workload_name

K8s workload (or service) name

All

pod_name

K8s pod name

All

container_name

K8s container name

All

container_image

K8s container image name

All

remote_namespace

Remote K8s namespace (other side of the communication)

All

remote_service_name

Remote K8s service name (other side of the communication)

All

remote_container_name

Remote K8s container name (other side of the communication)

All

type

The protocol in use (HTTP, gRPC, Kafka, DNS etc.)

All

sub_type

The sub type of the protocol (GET, POST, etc)

All

role

Role in the communication (client or server)

All

clustered_resource_name

The clustered name of the resource, depends on the protocol

All

status_code

"ok", "error" or "unset"

All

server

The server workload/name

All

client

The client workload/name

All

server_namesapce

The server namespace

All

client_namespace

The client namespace

All

server_is_external

Indicate whether the server is external

All

client_is_external

Indicate wheter the client is external

All

is_encrypted

Indicate whether the communication is encrypted

All

is_cross_az

Indicate wether the communication is cross availability zone

All

clustered_path

HTTP / gRPC aggregated resource path (e.g. /metrics/*)

http, grpc

method

HTTP / gRPC method (e.g GET)

http, grpc

response_status_code

Return status code of a HTTP / gPRC request (e.g. 200 in HTTP)

http, grpc

dialect

SQL dialect (MySQL or PostgreSQL)

mysql, postgresql

response_status

Return status code of a SQL query (e.g 42P01 for undefined table)

mysql, postgresql

client_type

Kafka client type (Fetcher / Producer)

kafka

topic

Kafka topic name

kafka

partition

Kafka partition identifier

kafka

error_code

Kafka return status code

kafka

query_type

type of DNS query (e.g. AAAA)

dns

response_return_code

Return status code of a DNS resolution request (e.g. Name Error)

dns

exit_code

K8s container termination exit code

container_state, container_crash

state

K8s container current state (Running, Waiting or Terminated)

container_state

state_reason

K8s container state transition reason (e.g CrashLoopBackOff or OOMKilled)

container_state

crash_reason

K8s container crash reason (e.g Error, OOMKilled)

container_crash

pvc_name

K8s PVC name

storage

Summary based metrics have an additional quantile label, representing the percentile. Available values: [”0.5”, “0.95”, 0.99”].

Golden Signals (Errors & Issues)

In the lists below, we describe error and issue counters. Every issue flagged by the platform is an error; but not every error is flagged as an issue.

Resource metrics

Name
Description
Unit
Type

groundcover_resource_total_counter

total amount of resource requests

Number

Counter

groundcover_resource_error_counter

total amount of requests with error status codes

Number

Counter

groundcover_resource_issue_counter

total amount of requests which were flagged as issues

Number

Counter

groundcover_resource_success_counter

total amount of resource requests with OK status codes

Number

Counter

groundcover_resource_latency_seconds

resource latency

Seconds

Summary

Workload metrics

Name
Description
Unit
Type

groundcover_workload_total_counter

total amount of requests handled by the workload

Number

Counter

groundcover_workload_error_counter

total amount of requests handled by the workload with error status codes

Number

Counter

groundcover_workload_issue_counter

total amount of requests handled by the workload which were flagged as issues

Number

Counter

groundcover_workload_success_counter

total amount of requests handled by the workload with OK status codes

Number

Counter

groundcover_workload_latency_seconds

resource latency across all of the workload APIs

Seconds

Summary

Kafka specific metrics

Name
Description
Unit
Type

groundcover_workload_client_offset

client last message offset (for producer the last offset produced, for consumer the last requested offset), aggregated by workload

Gauge

groundcover_workload_calc_lagged_messages

current lag in messages, aggregated by workload

Number

Gauge

groundcover_workload_calc_lag_seconds

current lag in time, aggregated by workload

Seconds

Gauge

Last updated