Infrastructure Monitoring

Get complete visibility into your cloud infrastructure performance at any scale, easily access all your metrics in one place and optimize infrastructure efficiency.

Overview

The groundcover platform offers infrastructure monitoring capabilities that were built for cloud-native environments. It enables you to track the
health and efficiency of your infrastructure instantly, with an effortless deployment process.

Troubleshoot efficiently - acting as a centralized hub for all your infrastructure, application and customer metrics allows you to query, correlate and troubleshoot your cloud environments using real time data and insight on your entire stack.

Store it all, without a sweat - store any metrics volume without worrying about cardinality or retention limits. Your subscription costs remain unaffected by the granularity of metrics you store or query.

Collection

groundcover's proprietary eBPF sensor leverages all its innovative powers to collect comprehensive data across your cloud environments without the burden of performance overhead. This data is sourced from various Kubernetes components, including kube-system workloads, cluster information via the Kubernetes API, and the applications' interactions with the Kubernetes infrastructure. This level of detailed collection at the kernel level enables the ability to provide actionable insights into the health of your Kubernetes clusters, which are indispensable for troubleshooting existing issues and taking proactive steps to future-proof your cloud environments.

Configuration

You also have the option to define the retention period for your metrics in the VictoriaMetrics database. By default, logs are retained for 7 days, but you can adjust this period to your preferences.

Enrichment

Beyond collecting data, groundcover's methodology involves a strategic layer of data enrichment that seeks to correlate Kubernetes metrics with application performance indicators. This correlation is crucial for creating a transparent image of the Kubernetes ecosystem. It enables a deep understanding of how Kubernetes interacts with applications, identifying potential points of failure across the interconnected environment. By monitoring Kubernetes not as an isolated platform but as an integral part of the application infrastructure, groundcover ensures that the monitoring strategy aligns with your dynamic and complex cloud operations.

Infrastructure Metrics

Monitoring a cluster involves tracking resources that are critical to the performance and stability of the entire system. Monitoring these essential metrics is crucial for maintaining a healthy Kubernetes cluster:

  • CPU consumption: It's essential to track the CPU resources being utilized against the total capacity to prevent workloads from failing due to insufficient CPU availability.

  • Memory utilization: Keeping an eye on the remaining memory resources ensures that your cluster doesn't encounter disruptions due to memory shortages.

  • Disk space allocation: For Kubernetes clusters running stateful applications or requiring persistent storage for data, such as etcd databases, tracking the available disk space is crucial to avert potential storage deficiencies.

  • Network usage: Visualize traffic rates and connections being established and closed on a service-to-service level of granularity, and easily pinpoint cross availability zone communication to investigate misconfigurations and surging costs.

Container CPU and Memory

Available Labels

type

clusterId region namespace node_name workload_name

pod_name container_name container_image

Available Metrics

NameDescriptionType

groundcover_container_m_cpu_usage_seconds_total

cumulative cpu time consumed (mCPU * seconds)

Counter

groundcover_container_cpu_request_m_cpu

K8s container CPU request (mCPU)

Gauge

groundcover_container_cpu_limit_m_cpu

K8s container CPU limit (mCPU)

Gauge

groundcover_container_mem_working_set_bytes

current memory working set (B)

Gauge

groundcover_container_memory_request_bytes

K8s container memory request (B)

Gauge

groundcover_container_memory_limit_bytes

K8s container memory limit (B)

Gauge

groundcover_container_cpu_delay_seconds

K8s container CPU delay accounting in seconds

Counter

groundcover_container_disk_delay_seconds

K8s container disk delay accounting in seconds

Counter

groundcover_container_cpu_throttled_seconds_total

K8s container total CPU throttling in seconds

Counter

Node CPU, Memory and Disk

Available Labels

type clusterId region node_name

Available Metrics

NameDescriptionType

groundcover_node_allocatable_cpum_cpu

amount of allocatable CPU in the current node (mCPU)

Gauge

groundcover_node_allocatable_mem_bytes

amount of allocatable memory in the current node (B)

Gauge

groundcover_node_mem_used_percent

percent of used memory in current node (0-100)

Gauge

groundcover_node_used_disk_space

current used disk space in current node (B)

Gauge

groundcover_node_free_disk_space

amount of free disk space in current node (B)

Gauge

groundcover_node_total_disk_space

amount of total disk space in current node (B)

Gauge

groundcover_node_used_percent_disk_space

percent of used disk space in current node (0-100)

Gauge

PVC Usage

Available Labels

type clusterId region name namespace

Available Metrics

NameDescriptionType

groundcover_pvc_usage_bytes

PVC used bytes (B)

Gauge

groundcover_pvc_capacity_bytes

PVC capacity bytes (B)

Gauge

groundcover_pvc_available_bytes

PVC available bytes (B)

Gauge

groundcover_pvc_usage_percent

percent of used pvc storage (0-100)

Gauge

Network Usage

Available Labels

clusterId workload_name namespace container_name remote_service_name remote_namespace remote_is_external availability_zone region remote_availability_zone remote_region is_cross_az protocol role server_port encryption transport_protocol is_loopback

Notes:

  • is_loopback and remote_is_external are special labels that indicate the remote service is either the same service as the recording side (loopback) or resides in an external network, e.g managed service outside of the cluster (external).

    • In both cases the remote_service_name and the remote_namespace labels will be empty

  • is_cross_az means the traffic was sent and/or received between two different availability zones. This is a helpful flag to quickly identify this special kind of communication.

    • The actual zones are detailed in the availability_zone and remote_availability_zone labels

Available Metrics

NameDescriptionType

groundcover_network_rx_bytes_total

Bytes received by the workload (B)

Counter

groundcover_network_tx_bytes_total

Bytes sent by the workload (B)

Counter

groundcover_network_connections_opened_total

Connections opened by the workload

Counter

groundcover_network_connections_closed_total

Connections closed by the workload

Counter

Last updated