# Infrastructure Monitoring

## Overview

The groundcover platform offers infrastructure monitoring capabilities that were built for cloud-native environments. It enables you to track the health and efficiency of your infrastructure instantly, with an effortless deployment process.

**Troubleshoot efficiently -** acting as a centralized hub for all your infrastructure, application and customer metrics allows you to query, correlate and troubleshoot your cloud environments using real time data and insight on your entire stack.

**Store it all, without a sweat -** store any metrics volume without worrying about cardinality or retention limits. [Your subscription costs](https://www.groundcover.com/pricing) remain unaffected by the granularity of metrics you store or query.

## Collection

groundcover's proprietary eBPF sensor leverages all its innovative powers to collect comprehensive data across your cloud environments without the burden of performance overhead. This data is sourced from various Kubernetes components, including kube-system workloads, cluster information via the Kubernetes API, and the applications' interactions with the Kubernetes infrastructure. This level of detailed collection at the kernel level enables the ability to provide actionable insights into the health of your Kubernetes clusters, which are indispensable for troubleshooting existing issues and taking proactive steps to future-proof your cloud environments.

## Configuration

You also have the option to [define the retention period](https://docs.groundcover.com/customization/customize-usage/custom-data-retention) for your metrics in the VictoriaMetrics database. By default, logs are retained for 7 days, but you can adjust this period to your preferences.

## Enrichment

Beyond collecting data, groundcover's methodology involves a strategic layer of data enrichment that seeks to correlate Kubernetes metrics with application performance indicators. This correlation is crucial for creating a transparent image of the Kubernetes ecosystem. It enables a deep understanding of how Kubernetes interacts with applications, identifying [potential points of failure](https://docs.groundcover.com/capabilities/broken-reference) across the interconnected environment. By monitoring Kubernetes not as an isolated platform but as an integral part of the application infrastructure, groundcover ensures that the monitoring strategy aligns with your dynamic and complex cloud operations.

<figure><img src="https://2771001740-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FUHgqKYgCiRKdOpWQdi52%2Fuploads%2Fgit-blob-06598bb3a319d37dd1cbaa6b56ca98a56ec0ba59%2FInfra%20Management%201.jpg?alt=media" alt=""><figcaption></figcaption></figure>

## Infrastructure Metrics

Monitoring a cluster involves tracking resources that are critical to the performance and stability of the entire system. Monitoring these essential metrics is crucial for maintaining a healthy Kubernetes cluster:

* **CPU consumption**: It's essential to track the CPU resources being utilized against the total capacity to prevent workloads from failing due to insufficient CPU availability.
* **Memory utilization**: Keeping an eye on the remaining memory resources ensures that your cluster doesn't encounter disruptions due to memory shortages.
* **Disk space allocation**: For Kubernetes clusters running stateful applications or requiring persistent storage for data, such as etcd databases, tracking the available disk space is crucial to avert potential storage deficiencies.
* **Network usage:** Visualize traffic rates and connections being established and closed on a service-to-service level of granularity, and easily pinpoint cross availability zone communication to investigate misconfigurations and surging costs.

### Container CPU and Memory

<mark style="color:green;background-color:yellow;">`Available Labels`</mark>

**`type`**

**`clusterId`** **`region`** **`namespace`** **`node_name`** **`workload_name`**

**`pod_name`** **`container_name`** **`container_image`**

<mark style="color:green;background-color:yellow;">`Available Metrics`</mark>

<table><thead><tr><th width="397.3333333333333">Name</th><th>Description</th><th>Type<select><option value="c476a927c24d4e67b8115feb2e039751" label="Counter" color="blue"></option><option value="655b3932f5bd49b0aacc43978ad78425" label="Gauge" color="blue"></option><option value="c0106f6abc0244adbf70f29447aa3875" label="Summary" color="blue"></option></select></th></tr></thead><tbody><tr><td>groundcover_container_cpu_usage_rate_millis</td><td>CPU usage in mCPU</td><td><span data-option="655b3932f5bd49b0aacc43978ad78425">Gauge</span></td></tr><tr><td>groundcover_container_cpu_request_m_cpu</td><td>K8s container CPU request (mCPU)</td><td><span data-option="655b3932f5bd49b0aacc43978ad78425">Gauge</span></td></tr><tr><td>groundcover_container_cpu_limit_m_cpu</td><td>K8s container CPU limit (mCPU)</td><td><span data-option="655b3932f5bd49b0aacc43978ad78425">Gauge</span></td></tr><tr><td>groundcover_container_memory_working_set_bytes</td><td>current memory working set (B)</td><td><span data-option="655b3932f5bd49b0aacc43978ad78425">Gauge</span></td></tr><tr><td>groundcover_container_memory_rss_bytes</td><td>current memory RSS (B)</td><td><span data-option="655b3932f5bd49b0aacc43978ad78425">Gauge</span></td></tr><tr><td>groundcover_container_memory_request_bytes</td><td>K8s container memory request (B)</td><td><span data-option="655b3932f5bd49b0aacc43978ad78425">Gauge</span></td></tr><tr><td>groundcover_container_memory_limit_bytes</td><td>K8s container memory limit (B)</td><td><span data-option="655b3932f5bd49b0aacc43978ad78425">Gauge</span></td></tr><tr><td>groundcover_container_cpu_delay_seconds</td><td>K8s container CPU delay accounting in seconds</td><td><span data-option="c476a927c24d4e67b8115feb2e039751">Counter</span></td></tr><tr><td>groundcover_container_disk_delay_seconds</td><td>K8s container disk delay accounting in seconds</td><td><span data-option="c476a927c24d4e67b8115feb2e039751">Counter</span></td></tr><tr><td>groundcover_container_cpu_throttled_seconds_total</td><td>K8s container total CPU throttling in seconds</td><td><span data-option="c476a927c24d4e67b8115feb2e039751">Counter</span></td></tr></tbody></table>

### Node CPU, Memory and Disk

<mark style="color:green;background-color:yellow;">`Available Labels`</mark>

**`type`** **`clusterId`** **`region`** **`node_name`**

<mark style="color:green;background-color:yellow;">`Available Metrics`</mark>

<table><thead><tr><th width="397.3333333333333">Name</th><th>Description</th><th>Type<select><option value="c476a927c24d4e67b8115feb2e039751" label="Counter" color="blue"></option><option value="655b3932f5bd49b0aacc43978ad78425" label="Gauge" color="blue"></option><option value="c0106f6abc0244adbf70f29447aa3875" label="Summary" color="blue"></option></select></th></tr></thead><tbody><tr><td>groundcover_node_allocatable_cpum_cpu</td><td>amount of allocatable CPU in the current node (mCPU)</td><td><span data-option="655b3932f5bd49b0aacc43978ad78425">Gauge</span></td></tr><tr><td>groundcover_node_allocatable_mem_bytes</td><td>amount of allocatable memory in the current node (B)</td><td><span data-option="655b3932f5bd49b0aacc43978ad78425">Gauge</span></td></tr><tr><td>groundcover_node_mem_used_percent</td><td>percent of used memory in current node (0-100)</td><td><span data-option="655b3932f5bd49b0aacc43978ad78425">Gauge</span></td></tr><tr><td>groundcover_node_used_disk_space</td><td>current used disk space in current node (B)</td><td><span data-option="655b3932f5bd49b0aacc43978ad78425">Gauge</span></td></tr><tr><td>groundcover_node_free_disk_space</td><td>amount of free disk space in current node (B)</td><td><span data-option="655b3932f5bd49b0aacc43978ad78425">Gauge</span></td></tr><tr><td>groundcover_node_total_disk_space</td><td>amount of total disk space in current node (B)</td><td><span data-option="655b3932f5bd49b0aacc43978ad78425">Gauge</span></td></tr><tr><td>groundcover_node_used_percent_disk_space</td><td>percent of used disk space in current node (0-100)</td><td><span data-option="655b3932f5bd49b0aacc43978ad78425">Gauge</span></td></tr></tbody></table>

### PVC Usage

<mark style="color:green;background-color:yellow;">`Available Labels`</mark>

**`type`** **`clusterId`** **`region`** **`name`** **`namespace`**

<mark style="color:green;background-color:yellow;">`Available Metrics`</mark>

<table><thead><tr><th width="397.3333333333333">Name</th><th>Description</th><th>Type<select><option value="c476a927c24d4e67b8115feb2e039751" label="Counter" color="blue"></option><option value="655b3932f5bd49b0aacc43978ad78425" label="Gauge" color="blue"></option><option value="c0106f6abc0244adbf70f29447aa3875" label="Summary" color="blue"></option></select></th></tr></thead><tbody><tr><td>groundcover_pvc_usage_bytes</td><td>PVC used bytes (B)</td><td><span data-option="655b3932f5bd49b0aacc43978ad78425">Gauge</span></td></tr><tr><td>groundcover_pvc_capacity_bytes</td><td>PVC capacity bytes (B)</td><td><span data-option="655b3932f5bd49b0aacc43978ad78425">Gauge</span></td></tr><tr><td>groundcover_pvc_available_bytes</td><td>PVC available bytes (B)</td><td><span data-option="655b3932f5bd49b0aacc43978ad78425">Gauge</span></td></tr><tr><td>groundcover_pvc_usage_percent</td><td>percent of used pvc storage (0-100)</td><td><span data-option="655b3932f5bd49b0aacc43978ad78425">Gauge</span></td></tr></tbody></table>

### Network Usage

<mark style="color:green;background-color:yellow;">`Available Labels`</mark>

**`clusterId workload_name`** **`namespace`** **`container_name`** **`remote_service_name`** **`remote_namespace`** **`remote_is_external`** **`availability_zone`** **`region`** **`remote_availability_zone`** **`remote_region`** **`is_cross_az`** **`protocol`** **`role`** **`server_port`** **`encryption`** **`transport_protocol`** **`is_loopback`**

Note&#x73;**:**

* `is_loopback` and `remote_is_external` are special labels that indicate the remote service is either the same service as the recording side (loopback) or resides in an external network, e.g managed service outside of the cluster (external).
  * In both cases the `remote_service_name` and the `remote_namespace` labels will be empty
* `is_cross_az` means the traffic was sent and/or received between two different availability zones. This is a helpful flag to quickly identify this special kind of communication.
  * The actual zones are detailed in the `availability_zone` and `remote_availability_zone` labels

<mark style="color:green;background-color:yellow;">`Available Metrics`</mark>

<table><thead><tr><th width="417.3333333333333">Name</th><th>Description</th><th>Type<select><option value="c476a927c24d4e67b8115feb2e039751" label="Counter" color="blue"></option><option value="655b3932f5bd49b0aacc43978ad78425" label="Gauge" color="blue"></option><option value="c0106f6abc0244adbf70f29447aa3875" label="Summary" color="blue"></option></select></th></tr></thead><tbody><tr><td>groundcover_network_rx_bytes_total</td><td>Bytes received by the workload (B)</td><td><span data-option="c476a927c24d4e67b8115feb2e039751">Counter</span></td></tr><tr><td>groundcover_network_tx_bytes_total</td><td>Bytes sent by the workload (B)</td><td><span data-option="c476a927c24d4e67b8115feb2e039751">Counter</span></td></tr><tr><td>groundcover_network_connections_opened_total</td><td>Connections opened by the workload</td><td><span data-option="c476a927c24d4e67b8115feb2e039751">Counter</span></td></tr><tr><td>groundcover_network_connections_closed_total</td><td>Connections closed by the workload</td><td><span data-option="c476a927c24d4e67b8115feb2e039751">Counter</span></td></tr><tr><td>groundcover_network_connections_opened_failed_total</td><td>Connections attempts failed per workload (including refused connections)</td><td><span data-option="c476a927c24d4e67b8115feb2e039751">Counter</span></td></tr><tr><td>groundcover_network_connections_opened_refused_total</td><td>Connections attempts refused per workload</td><td><span data-option="c476a927c24d4e67b8115feb2e039751">Counter</span></td></tr></tbody></table>
