Infrastructure Monitoring
Get complete visibility into your cloud infrastructure performance at any scale, easily access all your metrics in one place and optimize infrastructure efficiency.
Last updated
Get complete visibility into your cloud infrastructure performance at any scale, easily access all your metrics in one place and optimize infrastructure efficiency.
Last updated
The groundcover platform offers infrastructure monitoring capabilities that were built for cloud-native environments. It enables you to track the health and efficiency of your infrastructure instantly, with an effortless deployment process.
Troubleshoot efficiently - acting as a centralized hub for all your infrastructure, application and customer metrics allows you to query, correlate and troubleshoot your cloud environments using real time data and insight on your entire stack.
Store it all, without a sweat - store any metrics volume without worrying about cardinality or retention limits. Your subscription costs remain unaffected by the granularity of metrics you store or query.
groundcover's proprietary eBPF sensor leverages all its innovative powers to collect comprehensive data across your cloud environments without the burden of performance overhead. This data is sourced from various Kubernetes components, including kube-system workloads, cluster information via the Kubernetes API, and the applications' interactions with the Kubernetes infrastructure. This level of detailed collection at the kernel level enables the ability to provide actionable insights into the health of your Kubernetes clusters, which are indispensable for troubleshooting existing issues and taking proactive steps to future-proof your cloud environments.
You also have the option to define the retention period for your metrics in the VictoriaMetrics database. By default, logs are retained for 7 days, but you can adjust this period to your preferences.
Beyond collecting data, groundcover's methodology involves a strategic layer of data enrichment that seeks to correlate Kubernetes metrics with application performance indicators. This correlation is crucial for creating a transparent image of the Kubernetes ecosystem. It enables a deep understanding of how Kubernetes interacts with applications, identifying potential points of failure across the interconnected environment. By monitoring Kubernetes not as an isolated platform but as an integral part of the application infrastructure, groundcover ensures that the monitoring strategy aligns with your dynamic and complex cloud operations.
Monitoring a cluster involves tracking resources that are critical to the performance and stability of the entire system. Monitoring these essential metrics is crucial for maintaining a healthy Kubernetes cluster:
CPU consumption: It's essential to track the CPU resources being utilized against the total capacity to prevent workloads from failing due to insufficient CPU availability.
Memory utilization: Keeping an eye on the remaining memory resources ensures that your cluster doesn't encounter disruptions due to memory shortages.
Disk space allocation: For Kubernetes clusters running stateful applications or requiring persistent storage for data, such as etcd databases, tracking the available disk space is crucial to avert potential storage deficiencies.
Network usage: Visualize traffic rates and connections being established and closed on a service-to-service level of granularity, and easily pinpoint cross availability zone communication to investigate misconfigurations and surging costs.
Available Labels
type
clusterId
region
namespace
node_name
workload_name
pod_name
container_name
container_image
Available Metrics
Available Labels
type
clusterId
region
node_name
Available Metrics
Available Labels
type
clusterId
region
name
namespace
Available Metrics
Available Labels
clusterId workload_name
namespace
container_name
remote_service_name
remote_namespace
remote_is_external
availability_zone
region
remote_availability_zone
remote_region
is_cross_az
protocol
role
server_port
encryption
transport_protocol
is_loopback
Notes:
is_loopback
and remote_is_external
are special labels that indicate the remote service is either the same service as the recording side (loopback) or resides in an external network, e.g managed service outside of the cluster (external).
In both cases the remote_service_name
and the remote_namespace
labels will be empty
is_cross_az
means the traffic was sent and/or received between two different availability zones. This is a helpful flag to quickly identify this special kind of communication.
The actual zones are detailed in the availability_zone
and remote_availability_zone
labels
Available Metrics
Name | Description | Type |
---|---|---|
Name | Description | Type |
---|---|---|
Name | Description | Type |
---|---|---|
Name | Description | Type |
---|---|---|
groundcover_container_cpu_usage_rate_millis
CPU usage in mCPU
groundcover_container_cpu_request_m_cpu
K8s container CPU request (mCPU)
groundcover_container_cpu_limit_m_cpu
K8s container CPU limit (mCPU)
groundcover_container_memory_working_set_bytes
current memory working set (B)
groundcover_container_memory_rss_bytes
current memory RSS (B)
groundcover_container_memory_request_bytes
K8s container memory request (B)
groundcover_container_memory_limit_bytes
K8s container memory limit (B)
groundcover_container_cpu_delay_seconds
K8s container CPU delay accounting in seconds
groundcover_container_disk_delay_seconds
K8s container disk delay accounting in seconds
groundcover_container_cpu_throttled_seconds_total
K8s container total CPU throttling in seconds
groundcover_node_allocatable_cpum_cpu
amount of allocatable CPU in the current node (mCPU)
groundcover_node_allocatable_mem_bytes
amount of allocatable memory in the current node (B)
groundcover_node_mem_used_percent
percent of used memory in current node (0-100)
groundcover_node_used_disk_space
current used disk space in current node (B)
groundcover_node_free_disk_space
amount of free disk space in current node (B)
groundcover_node_total_disk_space
amount of total disk space in current node (B)
groundcover_node_used_percent_disk_space
percent of used disk space in current node (0-100)
groundcover_pvc_usage_bytes
PVC used bytes (B)
groundcover_pvc_capacity_bytes
PVC capacity bytes (B)
groundcover_pvc_available_bytes
PVC available bytes (B)
groundcover_pvc_usage_percent
percent of used pvc storage (0-100)
groundcover_network_rx_bytes_total
Bytes received by the workload (B)
groundcover_network_tx_bytes_total
Bytes sent by the workload (B)
groundcover_network_connections_opened_total
Connections opened by the workload
groundcover_network_connections_closed_total
Connections closed by the workload
groundcover_network_connections_opened_failed_total
Connections attempts failed per workload (including refused connections)
groundcover_network_connections_opened_refused_total
Connections attempts refused per workload