Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Stream, store, and query your logs at any scale, for a fixed cost.
Our Log Management solution is built for high scale and fast query performance so you can analyze logs quickly and effectively from all your cloud environments.
Gain context - Each log data is enriched with actionable context and correlated with relevant metrics and traces in one single view so you can find what you’re looking for and troubleshoot, faster.
Centralize to maximize - The groundcover platform can act as a limitless, centralized log management hub. Your subscription costs are completely unaffected by the amount of logs you choose to store or query. It's entirely up to you to decide.
groundcover ensures a seamless log collection experience with our , which automatically collects and aggregates all logs in all formats - including JSON, plain text, NGINX logs, and more. All this without any configuration needed.
This sensor is deployed as a DaemonSet, running a single pod on each node within your Kubernetes cluster. This configuration enables the groundcover platform to automatically collect logs from all of your pods, across all namespaces in your cluster. This means that once you've installed groundcover, no further action is needed on your part for log collection. The logs collected by each sensor instance are then channeled to the OTel Collector.
Acting as the central processing hub, the OTel Collector is a vendor-agnostic tool that receives logs from various sensor pods. It processes, enriches, and forwards the data into groundcover's ClickHouse database, where all log data from your cluster is .
Logs Attributes enable advanced filtering capabilities and is currently supported for the formats:
JSON
Common Log Format (CLF) - like those from NGNIX and Kong
logfmt
groundcover automatically detects the format of these logs, extracting key:value pairs from the original log records as Attributes.
Each attribute can be added to your filters and search queries.
Example: filtering a log in a supported format with a field of a request path "/status" will look as follows: @request.path:"/status". Syntax can be found .
groundcover offers the flexibility to craft tailored collection filtering rules, you can choose to set up filters and collect only the logs that are essential for your analysis, avoiding unnecessary data noise. For guidance on configuring your filters, explore our section.
You also have the option to for your logs in the ClickHouse database. By default, logs are retained for 3 days. To adjust this period to your preferences, visit our section for instructions.
Once logs are collected and ingested, they are available within the groundcover platform in the Log Explorer, which is designed for quick searches and seamless exploration of your logs data. Using the Log Explorer you can troubleshoot and explore your logs with advanced search capabilities and filters, all within a clear and fast interface.
The Log Explorer integrates dynamic filters and a versatile search functionality that enables you to quickly and easily identify the right data. You can filter out logs by selecting one or multiple criteria, including log-level, workload, namespace and more, and can limit your search to a specific time range.
groundcover natively supports setting up log pipelines using This allow for full flexibility in the processing and manipulation of logs being collected - parsing additional patterns by regex, renaming attributes, and many more.
groundcover's unique pricing model is the first to decouple data volumes from cost of owning and operating the solution. For example, subscribing to our costs $30 per node/host per month.
Overall, the cost of owning and operating groundcover is based on two factors:
The number of nodes (hosts) you are running in the environment you are monitoring
groundcover is a full stack, cloud-native observability platform, developed to break all industry paradigms - from making instrumentation a thing of the past, to decoupling cost from data volumes
The platform consolidates all your traces, metrics, logs, and Kubernetes events into a single pane of glass, allowing you to identify issues faster than ever before and conduct granular investigations for quick remediation and long-term prevention.
Our is not impacted by the volume of data generated by the environments you monitor, so you can dare to start monitoring environments that had been blind spots until now - such as your Dev and Staging clusters. This, in turn, provides you visibility into all your environments, making it much more likely to identify issues in the early stages of development, rather than in your live product.
groundcover introduces game-changing concepts to observability:
The costs of hosting groundcover's backend in your environment
Check out our TCO calculator to simulate your total cost of ownership for groundcover.
Definitely. As you deploy groundcover each cluster is automatically assigned the unique name it holds inside your cloud environment. You can browse and select all your clusters at one place with our UI experience.
groundcover has been tested and validated on the most common K8s distributions. See full list in the Requirements section.
groundcover supports the most common protocols in most K8s production environments out-of-the-box. See full list here.
groundcover's kernel-level eBPF sensor automatically collects your logs, application metrics (such as latency, throughput, error rate and much more), infrastructure metrics (such as deployment updates, container crashes etc.), traces, and Kubernetes events. You can control which data is left out of the automatic collection using data obfuscation.
groundcover stores all the data it collects inside your environment, using the state-of-the-art storage services of ClickHouse and Victoria Metrics, with the option to offload data to object storage such as S3 for long-term retention. See our Architecture section for more details.
groundcover stores the data it collects in-cluster, inside your environment without ever leaving the cluster to be stored anywhere else.
Our SaaS UI experience stores only information related to the account, user access and general K8s metadata used for governance (like the number of nodes per cluster, the name given to the cluster etc.).
All the information served to the UI experience is encrypted all the way to the in-cluster data sources. groundcover has no access to your collected data, which is accessible only to an authenticated user from your organization. groundcover does collect telemetry information (opt-out is of course possible) which includes metrics about the performance of the deployment (e.g. resource consumption metrics) and logs reported from the groundcover components running in the cluster.
All telemetry information is anonymized, and contains no data related to your environment.
Regardless, groundcover is SOC2 and ISO 27001 compliant and follows best practices.
If you used your business email to create your groundcover account, you can invite your team to your workspace by clicking on the purple "Invite" button on the upper menu. This will open a pop-up where you can enter the emails of the people you want to invite. You also have an option to copy and share your private link.
Note: The Admin of the account (i.e. the person that created it) can also invite users outside of your email domain. Non-admin users can only invite users that share the same email domain. If you used a private email, you can only share the link to your workspace by clicking the "Share" button on the top bar.
Read more about invites in our quick start guide.
groundcover's CLI tool is currently Open Source along side more projects like Murre and Caretta. We're working on releasing more parts of our solution to Open Source very soon. Stay tuned in our GitHub page!
groundcover’s sensor uses eBPF, which means it can only deployed on a Kubernetes cluster that is running on a Linux system.
Installing using the CLI command is currently only supported on Linux and Mac.
You can install using the Helm command from any operating system.
Once installed, accessing the groundcover platform is possible from any web browser, on any operating system.

eBPF (extended Berkeley Packet Filter) is a groundbreaking technology that has significantly impacted the Linux kernel, offering a new way to safely and efficiently extend its capabilities.
By powering our sensor with eBPF, groundcover unlocks unprecedented granularity on your cloud environment, while also practically eliminating the need for human involvement in the installation and deployment process. Our unique sensor collects data directly from the Linux kernel with near-zero impact on CPU and memory.
Advantages of our eBPF sensor:
Zero instrumentation: groundcover's eBPF sensor gathers granular observability data without the need for integrating an SDK or changing your applications' code in any way. This enables all your logs, metrics, traces, and other observability data to flow automatically into the platform. In minutes, you gain full visibility into application and infrastructure health, performance, resource usage, and more.
Minimal resources footprint: groundcover’s sensor in installed on a dedicated node in each monitored cluster, operating separately from the applications it is monitoring. Without interference with the application's primary functions, the groundcover platform operates with near-zero impact on your resources, maintaining the applications' performance and avoiding unexpected overhead on the infrastructure.
A new level of insight granularity: With direct access to the Linux kernel, our eBPF sensor enables the collection of data straight from the source. This guarantees that the data is clean, unaltered, and precise. It also offers access to unique insight on your application and infrastructure, such as the ability to view the full traces of payloads, or analyzing network performance over time.
The one-of-a-kind architecture on which groundcover was built eliminates all requirements to stream your logs, metrics, traces, and other monitoring data outside of your environment and into a third-party's cloud. By leveraging integrations with best-of-breed technologies, including ClickHouse and Victoria Metrics, all your observability is stored data locally, with the ability of being fully managed by groundcover.
Advantages of our BYOC architecture:
By separating the data plane from the control plane, you get the advantages of a SaaS solution, without its security and privacy challenges.
With multiple deployment models available, you also get to choose the level of security and privacy your organization needs, up to the highest standards (FedRamp-level).
Automated deployment, maintenance & resource optimization with our inCloud Managed deployment option.
This concept is unique to groundcover, and takes a while to grasp. Read about our BYOC architecture more in detail in this dedicated section.
Enabled by our unique BYOC architecture, groundcover's vision is to revolutionize the industry by offering a pricing model that is unheard of anywhere else. Our fully transparent pricing model is based only on the number of nodes being monitored, and the costs of hosting the groundcover backend in your environment. Volume of logs, metrics, traces, and all other observability data don’t affect your cost. This results in savings of 60-90% compared to SaaS platforms.
In addition, all our subscription tiers never limit your access to features and capabilities.
Advantages of our nodes-based pricing model:
Cost is predictable and transparent, becoming an enabler of growth and expansion.
The ability to deploy groundcover in data-intensive environments enables the monitoring of Dev and Staging clusters, which promotes early identification of issues.
No cardinality or retention limits
Read our latest customer stories to learn how organization of varying sizes all reduce their observability costs dramatically by migrating to groundcover:
groundcover applies a stream processing approach to collect and control the continuous flow of data to gain immediate insights, detect anomalies, and respond to changing conditions. Unlike batch processing, where data is collected over a period and then analyzed, stream processing analyzes the data as it flows through the system.
Our platform uses a distributed stream processing engine that enables it to ingest huge amounts of data (such as logs, traces and Kubernetes events) in real time. It also processes all that data and instantly generates complex insights (such as metrics and context) based on it.
As a result, the volume of raw data stored dramatically decreases which, in turn, further reduces the overall cost of observability.
Designed for high scalability and rapid query performance, enabling quick and efficient log analysis from all your environments. Each log is enriched with actionable context and correlated with relevant metrics and traces, providing a comprehensive view for fast troubleshooting.
The groundcover platform provides cloud-native infrastructure monitoring, enabling automatic collection and real-time monitoring of infrastructure health and efficiency.
Gain end-to-end observability into your applications performance, identify and resolve issues instantly, all with zero code changes.
Real User Monitoring (RUM) extends groundcover’s observability platform to the client side, providing visibility into actual user interactions and front-end performance. It tracks key aspects of your web application as experienced by real users, then correlates them with backend metrics, logs, and traces for a full-stack view of your system.
Gain end-to-end observability into your applications performance, identify and resolve issues instantly - all with zero code changes.
The groundcover platform collects data all across your stack using the power of eBPF instrumentation. Our proprietary eBPF sensor is installed in seconds and provides 100% coverage into application metrics and traces with zero code changes or configurations.
Resolve faster - By seamlessly correlating traces with application metrics, logs, and infrastructure events, groundcover’s APM enables you to detect and resolve root issues faster.
Improve user experience - Optimize your application performance and resource utilization faster than ever before, avoid downtimes and make poor end-user experience a thing of the past.
Our revolutionary eBPF sensor, , is deployed as a DaemonSet in your Kubernetes cluster. This approach allows us to inspect every packet that each service is sending or receiving, achieving 100% coverage. No sampling rates or relying on statistical luck - all requests and responses are observed.
This approach would not be feasible without a resource-efficient eBPF-powered sensor. eBPF not only extends the ability to pinpoint issues - it does so with much less overhead than any other method. eBPF can be used to analyze traffic originating from every programming language and SDK - even for encrypted connections!
After being collected by our eBPF code, the traffic is then classified according to its protocol - which is identified directly from the underlying traffic, or the library from which it originated. Connections are reconstructed, and we can generate transactions - HTTP requests and responses, SQL queries and responses etc.
In order to give as much context as possible each transaction is enriched with as much metadata as possible. Some examples might include the pods that took part in this transaction (both client and server), the nodes on which these pods are scheduled, and the state of container at the time of the request.
It is important to emphasize the impressive granularity level with which this process takes place - every single transaction observed is fully enriched. This allows us to perform more advanced aggregations.
After being enriched by as much context as possible, the transactions as grouped together into meaningful aggregations. These could be defined by the workloads involved, the protocols detected and the resources that were accessed in the operations. These aggregations will mostly come into play when displaying .
After collecting the data, contextualizing it and putting it together in meaningful aggregations - we can now create and to provide meaningful insights into the services' behaviors.
Learn how groundcover's application metrics work:
Learn how groundcover's application traces work:
Nobl9 expands monitoring to cover production e2e, including testing and staging environments
Replacing Datadog with groundcover cut Nobl9’s observability costs in half while improving log coverage, providing deeper granularity on traces with eBPF, and enabling operational growth and scalability.
Tracr eliminates blind spots with native-K8s observability and eBPF tracing
Tracr migrates from a fragmented observability stack to groundcover, gaining deep Kubernetes visibility, automated eBPF tracing, and a cost-effective monitoring solution. This transition streamlined troubleshooting, expanded observability across teams, and enhanced the reliability of their blockchain infrastructure.




The following architectures are fully supported for all groundcover workloads:
x86
ARM
These are patterns we've seen in the wild. Agents use groundcover to debug, monitor, and close the loop.
Cursor generates tests, tags each with a test_id, logs them, and then uses groundcover to instantly fetch related log lines.
Got a monitor firing? Drop the alert into Cursor. The agent runs a quick RCA, queries groundcover, and even suggests a patch based on recent logs and traces.
Support rep gets an error ID → uses MCP to query groundcover → jumps straight to the root cause by exploring traces and logs around the error.
An agent picks up a ticket, writes tests, ships code to staging, monitors it with groundcover, checks logs and traces, and verifies the fix end to end. Yes, really. Full loop. Almost no hands.
groundcover enables access to an embedded Grafana, within the groundcover platform's interface. This enables you to easily import and continue using your existing Grafana dashboards and alerts.
The following guides will help you setup and import your visualizations from Grafana:
Build alerts & dashboards with Grafana Terraform provider
Using groundcover as Prometheus/Clickhouse database in a Self-hosted Grafana
The Drilldown view helps you to quickly identify and highlight the most informative attributes - those that stand out and help you pinpoint anomalies or bottlenecks.
In this mode, groundcover showcases the top attributes found in your traces or logs data. Each attribute displays up to four values with the highest occurrence across the selected traces.
You can click any value to add or exclude it as a filter and continue drilling down interactively.
We use statistical scoring based on:
Entropy: how diverse the values of an attribute are.
Presence ratio: how often the attribute appears across the selected traces.
Attributes that are both common and have high entropy are prioritized.
To ensure a seamless experience with groundcover, it's important to confirm that your environment meets the necessary requirements. Please review the detailed requirements for Kubernetes, our eBPF sensor, and the necessary hardware and resources to guarantee optimal performance.
groundcover supports a wide range of Kubernetes versions and distributions, including popular platforms like EKS, AKS, and GKE.
Our state-of-the-art eBPF sensor leverages advanced kernel features to deliver comprehensive monitoring with minimal overhead, requiring specific Linux kernel versions, permissions, and CO:RE support.
groundcover fully supports both x86 and ARM processors, ensuring compatibility across diverse environments.
groundcover operates ClickHouse to support many of its core features. This requires suitable resources given to the deployment, which groundcover takes care of according to your data usage.
Welcome to the API examples section. Here, you’ll find practical demonstrations of how to interact with our API endpoints using cURL commands. Each example is designed to help you quickly understand h
cURL-based examples: Every example shows the exact cURL command you can copy and run directly in your terminal.
Endpoint-specific demonstrations: We walk through different API endpoints one by one, highlighting the required parameters and common use cases.
Request & Response clarity: Each section contains both the request (what you send) and the response (what you get back) to illustrate expected behavior.
Before running any of the examples, make sure you have:
Get complete visibility into your cloud infrastructure performance at any scale, easily access all your metrics in one place and optimize infrastructure efficiency.
The groundcover platform offers infrastructure monitoring capabilities that were built for cloud-native environments. It enables you to track the health and efficiency of your infrastructure instantly, with an effortless deployment process.
Troubleshoot efficiently - acting as a centralized hub for all your infrastructure, application and customer metrics allows you to query, correlate and troubleshoot your cloud environments using real time data and insight on your entire stack.
Store it all, without a sweat - store any metrics volume without worrying about cardinality or retention limits.
View, filter, and manage all monitors in one place, and quickly identify issues or create new monitors.
The Monitor List is the central hub for managing and monitoring all active and configured Monitors. It provides a clear, filterable table view of your Monitors, with their current status and key details, such as creation date, severity, and live issues. Use this page to review your Monitors performance, identify issues, and take appropriate action.
Supercharge your AI agents with o11y superpowers using the groundcover MCP server. Bring logs, traces, metrics, events, K8s resources, and more directly into your agent’s context — and troubleshoot side by side with your AI assistant to solve issues in minutes.
Status: Work in progress. We keep adding tools and polishing the experience. Got an idea or question? Ping us on Slack!
MCP (Model Context Protocol) is an open standard that enables AI tools to access external data sources like APIs, observability platforms, and documentation directly within their working context. MCP servers expose these resources in a consistent way, so the AI agent can query and use them as needed to streamline workflows.
Automated migration from legacy vendors. Bring over your monitors, dashboards, data sources, and all the data you need - with automatic mapping and zero downtime.
groundcover is the first observability platform to ship a one-click migration tool from legacy vendors. The migration flow automatically discovers, translates, and installs your observability setup into groundcover.
Goal: Move your entire observability stack with zero manual work. We don't just migrate assets - we bring the data and handle all the mapping for you.
Quickly understand your data with groundcover
groundcover insights give you a clear snapshot of notable events in your data. Currently, the platform supports Error Anomalies, with more insight types on the way.
Error Anomalies instantly highlight workloads, containers, or environments experiencing unusual spikes in Error or Critical logs, as well as Traces marked with an error status. These anomalies are detected using statistical algorithms, continuously refined through user feedback for accuracy.
Each insight surfaces trends based on the entity’s error signals (e.g., workload, container, etc.):
groundcover enables access to an embedded Grafana, within the groundcover platform's interface. This enables you to easily import and continue using your existing Grafana and alerts.
Explore and select pre-built Monitors from the catalog to quickly set up observability for your environment. Customize and deploy Monitors in just a few clicks.
Enhance your ability to easily monitor a group of clusters
Labeling clusters in your cloud-native environments is very helpful for efficient resource management and observability. By assigning an environment label in groundcover, you can categorize and identify clusters based on any specific criteria that you find helpful. For example, you can choose to label your clusters by environment type (development, staging, production, etc.), or by region (EU, US, etc.).
To add an environment label to your cluster, edit your cluster's existing values.yaml and add the following line:
Once defined and added, these labels will be available for you to select in the cluster and environment drop down menu ("Cluster Picker").
A service account is a non-human identity for API access, governed by RBAC and supporting multiple API keys.
By connecting your agent to groundcover’s MCP server, you enable it to:
Query live logs, traces, and metrics for a workload, container, or issue.
Run root cause analysis (RCA) on issues, right inside your IDE or chat interface.
Auto-debug code with observability context built in.
Monitor deployed code and validate fixes without switching tools.
See examples in our Getting-started Prompts and Real-world Use Cases.
Set up is quick and agent-friendly. We support both OAuth (recommended) and API key flows.
Head to Configure groundcover's MCP Server for setup instructions and client-specific guides.
Includes alert conditions, thresholds, and evaluation windows.
Complete dashboard migration with preserved layouts:
All widget types groundcover supports
Query translations
Time ranges and filters
Visual settings and arrangements
We detect what you're using in Datadog and help you set it up in groundcover. One-click migration for integrations is coming soon.
We don't just copy configurations - we ensure the data flows:
Automatic metric mapping: Datadog metric names translated to groundcover equivalents
Label translation: Tags become labels with intelligent mapping
Query conversion: Datadog query syntax converted to groundcover
Data validation: We verify all referenced metrics and data sources exist
Full migration support available now.
Additional vendors coming soon.
Three steps. No scripts. No downtime.
Fetch & discover: Provide API keys. groundcover pulls your monitors, dashboards, and data sources.
Automatic setup: We install data sources, map metrics, and prepare everything.
Migrate assets: Review and migrate monitors and dashboards with one click.
API keys are not stored.
Settings → Migrations (Admin role required)
The migration flow is structured to support additional asset types:
Data source configurations (available now)
Log pipelines (coming soon)
Advanced metric mappings (coming soon)
A service account has a name and email, but it cannot be used to log into the UI or via SSO. Instead, it functions purely for API access. Each account must have at least one RBAC policy assigned, which defines its permission level (Admin, Editor, Viewer) and data scope. Multiple policies can be attached to broaden access; effective permissions are the union of all policies.
Only Admins can create, update, or delete service accounts. This can be done via the UI (Settings → Access → Service Accounts) or API. During creation, Admins define the name, email, and initial policies. You can edit service account, changing email address and assigned policies, but can't rename.
A service account can have multiple API keys. This makes it easy to rotate credentials or issue distinct keys for different use cases. All keys are tied to the same account and carry its permissions. Any action taken using a key is logged as performed by the associated service account.
Email via Zapier - Route alerts to email using Zapier
Slack App with Bot Tokens - Route alerts to different slack channels with a single Webhook
Each example includes:
Prerequisites and setup requirements
Step-by-step configuration instructions
Complete workflow YAML configurations
Integration-specific considerations and best practices
These examples demonstrate advanced webhook usage patterns and can serve as templates for other webhook integrations.
level:error or level:critical, and grouped by:workload
container
namespace
environment
cluster
On Traces, anomalies are based on traces filtered by status:error, and grouped by a more granular set of dimensions:
protocol_type
return_code
role (client/server)
workload
container
namespace
environment
cluster

You can select as many monitors as you wish, and add them all in one click. Select a complete pack or multiple Monitors from different packs, then click "Create Monitor". All Monitors will be automatically created. You can always edit them later.
You can also create a single Monitor from the Catalog. When hovering over a Monitor, a "Wizard" button will appear. Clicking on it will direct you to the Monitor Creation Wizard where you can review and edit before creation.

env: "my-env-name"The following configurations are deprecated but may still be in use in older setups.
The legacy datasources API key can be obtained by running: groundcover auth get-datasources-api-key
You can query the legacy Prometheus API directly using curl:
# Query current metrics
curl -H "Authorization: Bearer <YOUR_API_KEY>" \
"https://app.groundcover.com/api/prometheus/api/v1/query?query=up"# Query current logs (legacy endpoint)
curl https://ds.groundcover.com/ -H "X-ClickHouse-Key: DS-API-KEY-VALUE" --data "SELECT count() FROM groundcover.logs LIMIT 1 FORMAT JSON" groundcover's proprietary eBPF sensor leverages all its innovative powers to collect comprehensive data across your cloud environments without the burden of performance overhead. This data is sourced from various Kubernetes components, including kube-system workloads, cluster information via the Kubernetes API, and the applications' interactions with the Kubernetes infrastructure. This level of detailed collection at the kernel level enables the ability to provide actionable insights into the health of your Kubernetes clusters, which are indispensable for troubleshooting existing issues and taking proactive steps to future-proof your cloud environments.
You also have the option to define the retention period for your metrics in the VictoriaMetrics database. By default, logs are retained for 7 days, but you can adjust this period to your preferences.
Beyond collecting data, groundcover's methodology involves a strategic layer of data enrichment that seeks to correlate Kubernetes metrics with application performance indicators. This correlation is crucial for creating a transparent image of the Kubernetes ecosystem. It enables a deep understanding of how Kubernetes interacts with applications, identifying potential points of failure across the interconnected environment. By monitoring Kubernetes not as an isolated platform but as an integral part of the application infrastructure, groundcover ensures that the monitoring strategy aligns with your dynamic and complex cloud operations.
Monitoring a cluster involves tracking resources that are critical to the performance and stability of the entire system. Monitoring these essential metrics is crucial for maintaining a healthy Kubernetes cluster:
CPU consumption: It's essential to track the CPU resources being utilized against the total capacity to prevent workloads from failing due to insufficient CPU availability.
Memory utilization: Keeping an eye on the remaining memory resources ensures that your cluster doesn't encounter disruptions due to memory shortages.
Disk space allocation: For Kubernetes clusters running stateful applications or requiring persistent storage for data, such as etcd databases, tracking the available disk space is crucial to avert potential storage deficiencies.
Network usage: Visualize traffic rates and connections being established and closed on a service-to-service level of granularity, and easily pinpoint cross availability zone communication to investigate misconfigurations and surging costs.
Available Labels
type
clusterId region namespace node_name workload_name
pod_name container_name container_image
Available Metrics
groundcover_container_cpu_usage_rate_millis
CPU usage in mCPU
groundcover_container_cpu_request_m_cpu
K8s container CPU request (mCPU)
groundcover_container_cpu_limit_m_cpu
K8s container CPU limit (mCPU)
groundcover_container_memory_working_set_bytes
current memory working set (B)
Available Labels
type clusterId region node_name
Available Metrics
groundcover_node_allocatable_cpum_cpu
amount of allocatable CPU in the current node (mCPU)
groundcover_node_allocatable_mem_bytes
amount of allocatable memory in the current node (B)
groundcover_node_mem_used_percent
percent of used memory in current node (0-100)
groundcover_node_used_disk_space
current used disk space in current node (B)
Available Labels
type clusterId region name namespace
Available Metrics
groundcover_pvc_usage_bytes
PVC used bytes (B)
groundcover_pvc_capacity_bytes
PVC capacity bytes (B)
groundcover_pvc_available_bytes
PVC available bytes (B)
groundcover_pvc_usage_percent
percent of used pvc storage (0-100)
Available Labels
clusterId workload_name namespace container_name remote_service_name remote_namespace remote_is_external availability_zone region remote_availability_zone remote_region is_cross_az protocol role server_port encryption transport_protocol is_loopback
Notes:
is_loopback and remote_is_external are special labels that indicate the remote service is either the same service as the recording side (loopback) or resides in an external network, e.g managed service outside of the cluster (external).
In both cases the remote_service_name and the remote_namespace labels will be empty
is_cross_az means the traffic was sent and/or received between two different availability zones. This is a helpful flag to quickly identify this special kind of communication.
The actual zones are detailed in the availability_zone and remote_availability_zone labels
Available Metrics
groundcover_network_rx_bytes_total
Bytes received by the workload (B)
groundcover_network_tx_bytes_total
Bytes sent by the workload (B)
groundcover_network_connections_opened_total
Connections opened by the workload
groundcover_network_connections_closed_total
Connections closed by the workload
Displays the following columns:
Name: Title of the monitor.
Creation Date: When the monitor was created.
Live Issues: Number of live issues currently firing.
Status: Is the Monitor "Firing" (alerts active) or "Normal" (no alerts).
Tip: Click on a Monitor name to view its detailed configuration and performance metrics.
You can create a new Monitor by clicking on Create Monitor, then choosing between the different options: Monitor Wizard, Monitor Catalog, or Import. For further guidance, check out our guide.
Use filters to narrow down monitors by:
Severity: S1, S2, S3, or custom severity levels.
Status: Alerting or Normal.
Silenced: Exclude silenced monitors.
Tip: Toggle multiple filters to refine your view.
Quickly locate monitors by typing a name, status, category, or other keywords.
Located at the top-right corner, use these to focus on monitors for specific clusters or environments.
A highly impactful advantage of leveraging eBPF in our proprietary sensor is that it enables visibility on the full payloads of both request and response - including headers! This allows you to very quickly understand issues and provides context.
groundcover enables to very easily build custom dashboards to visualize your data using our intuitive Query Builder as a guide, or using your own queries.
Define custom alerts using our native Monitors, which you can configure using groundcover data and custom metrics. You can also choose from our Monitors Catalog, which contains multiple pre-built Monitors that cover a few of the most common use cases and needs.
Invites lets you share your workspaces with your colleagues in just a couple of clicks. You can find the "Invite Members" option at the bottom of the left navigation bar. Type in the email addresses of the teammates you want to invite, and set their user permissions (Admin, Editor, Read Only), then click "Send Invites".

1️⃣ Go to the Dashboards tab in the groundcover app, and click New and then New Dashboard.
2️⃣ Create your first panel by clicking Add a new panel.
3️⃣ In the New panel view, go to the Query tab.
4️⃣ Choose your data source by pressing the -- Grafana -- on the data source selector. You would see your the metrics collected from each of your clusters an a Prometheus data source called Prometheus@<cluster-name>
5️⃣ Create your first Query in the PromQL query interface.

Delete an existing workflow using workflow id
DELETE /api/workflows/{id}
This endpoint requires API key authentication.
Status Code: 200 OK
Once a workflow is deleted, it cannot be recovered
The deletion is immediate and permanent
All associated workflow executions and history are also removed
The API returns HTTP 200 status code for both successful deletions and "not found" cases
For instructions on how to generate a Grafana Service Account Token and use it in the Grafana Terraform provider, see: Grafana Service Account Token.
Create a directory for the terraform assets
Create a main.tf file within the directory that contains the terraform provider configuration mentioned in step 2
Create the following dashboards.tf file, this example declares a new Golden Signals folder, and within it a Workload Golden Signals
add the workloadgoldensignals.json file to the directory as well
Run terraform init to initialize terraform context
Run terraform plan , you should see a long output describing the assets that are going to be created last line should state
Plan: 2 to add, 0 to change, 0 to destroy.
Run terraform apply
Here is a short video to demonstrate the process
You can read more about what you can achieve with the Grafana Terraform provider in the
Manage and create silences to suppress Monitor notifications during maintenance or specific periods, reducing noise and focusing on critical issues.
The Silences page lists all Silences you and your team created for your Monitors. In this section, you can also create and manage your Silence rules, to suppress notifications and Issues noise, for a specified period of time. Silences are a great way to reduce alert fatigue, which can lead to missing important issues, and help focus on the most critical issues during specific operational scenarios such as scheduled maintenances.
Follow these simple steps to create a new Silence.
Specify the timeframe for the silence rule. Note that the starting point doesn't have to be now, and can also be any time in the future.
Below the From / Until boxes, you'll see a Silence summary, showing its approximate length (rounded down to full days) and starting date.
Define the criteria for Monitors or Issues to be silenced.
Click Add Matcher to specify match conditions (e.g., cluster, namespace, span_name).
Combine multiple matchers for more granular control.
Example: Silence all Monitors in the "demo" namespace.
Preview the issues currently affected by the Silence rule, based on any defined Matchers. This list contains only actively firing Issues.
Tip: Us this preview to see the list of impacted issues and adjust your Matchers before finishing to create the Silence.
Add notes or context for the Silence rule. These comments help you and other users understand the purpose of the rule.
Learn how to build custom dashboards using groundcover
groundcover’s dashboards are designed to personalize your data visualization and maximize the value of your existing data. Dashboards are perfect for creating investigation flows for critical monitors, displaying the data you care about in a way that suits you and your team, and crafting insights from the data on groundcover.
Easily create a new Dashboard using our guide.
Multi-Mode Query Bar: The Query Bar is central to dashboards and supports multiple modes fully integrated with native pages and Monitors. Currently, the modes include Metrics, Infra Metrics, Logs, and Traces. Learn more in the .
Variables: Built-in variables allow you to filter data quickly based on a predefined list crafted by groundcover.
Widget Types: Two widget types are currently supported:
Chart Widget: Displays data visually.
Textual Widget: Adds context to your dashboards.
Display Types: Five display types are supported for data visualization:
Time Series, Table, Stat, Top List, and Pie. Read more in the
We strongly advise reading the intro guide to working with remap transforms in order to fully understand the functionalities of writing pipelines steps.
The following example will filter out all HTTP traces that include the /health URI.
Note that the filter transform works by setting up an allow condition - meaning, events which fail the condition will be dropped.
The filter below implements this logic:
If the event isn't an HTTP event, allow it
If the event is an HTTP event, and the resource name doesn't contain "/health", allow it
If the event is an HTTP event AND it has "/health" in the resource name, drop it
The following example will obfuscate response payloads from a specific server. This can be useful when you want to completely redact responses that contain sensitive data, such as secrets managed by an external server.
In this example we redact HTTP payloads for any request containing the host: frontend header.
Monitors offers the ability to define custom alerts, which you can configure using groundcover data and custom metrics.
A Monitor defines a set of rules and conditions that track the state of your system. When a monitor's conditions are met, it triggers an issue that is displayed on the Issues page and can be used for alerting using your integrations and workflows.
Easily create a new Monitor by using our guide.
The legacy Issues page is being deprecated in favor of a fully customizable, monitor-based experience that gives you more control over what constitutes an issue in your environment.
While the new page introduces powerful capabilities, no core functionality is being removed, the key change is that the old auto-created issue rules will no longer be automatically generated. Instead, you’ll define your own monitors, or choose from a rich catalog of prebuilt ones.
All the existing issues in the legacy page can be easily added to the monitors via the Monitors Catalog's "Started Pack". See below for more info.
The new Issues page is built on top of the Monitors engine, enabling deeper customization and automation:
Get started with groundcover
This is the first step to start with groundcover for all types of plans 🚀
The first thing you need to do to start using groundcover is using your email address (no credit card required for the free tier account). Signing up is only possible using a computer and will not be possible using a mobile phone or tablet. It is highly recommended you use your corporate email address, as it will make it easier to use other features such as inviting your colleagues to your workspace. However, signing up using Gmail, Outlook or any other public domains is also possible.
An API key in groundcover provides secure, programmatic access to the API on behalf of a service account. It inherits that account’s permissions and should be stored safely.
An API key in groundcover provides secure, programmatic access to the API on behalf of a service account. It inherits that account’s permissions and should be stored safely
Each API key is tied to a specific service account. It inherits the permissions defined by that account’s RBAC policies. Optionally, the key can be limited to a subset of those policies for more granular access control. An API key can never exceed the permissions of its parent service account.
Once your MCP server is connected, you can dive right in.
Here are a few prompts to try. They work out of the box with agents like Cursor, Claude, or VS Code:
💡 Starting your request with “Use groundcover” is a helpful nudge - it pushes the agent toward MCP tools and context.
MCP supports complex, multi-step flows, but starting simple is the best way to ramp up.
View and analyze monitor issues with detailed timelines, metadata, and context to quickly identify and resolve problems in your environment.
The Issues page provides a detailed view of active and resolved issues triggered by Monitors. This page helps users investigate, analyze, and resolve problems in their environment by visualizing issue trends and providing in-depth context through an issue drawer.
Clicking on an issue in the Issues List opens the Issue drawer, which provides an in-depth view of the Monitor and its triggered issue. You can also navigate if possible to related entities like workload, node, pod, etc.
groundcover provides a robust user interface that allows you to view and analyze all your observability data from inside the platform. However, there may be cases in which you need to query the data from outside our platform using API communication.
Our proprietary eBPF sensor automatically captures granular observability data, which is stored via our integrations with two best-of-breed technologies. VictoriaMetrics for metrics storage, and ClickHouse for storage of logs, traces, and Kubernetes events.
Read more about our architecture .
Run the following command in your CLI, and select tenant:
groundcover auth get-datasources-api-key
We strongly advise reading the in order to fully understand the functionalities of writing pipelines steps.
groundcover has various authentication key types for remotely interacting with our platform, whether to ingest observability data or to automate actions via our APIs:
- An API key in groundcover provides secure, programmatic access to the API on behalf of a . It inherits that account’s permissions and should be stored safely. This is also the key you need when working groundcover’s terraform provider. See:
groundcover's .
groundcover's Terraform provider: .
Multiple ways to connect your infrastructure and applications to groundcover
Connect your Kubernetes clusters using groundcover's eBPF-based sensor for automatic instrumentation and deep observability.
Connect Kubernetes clusters - Deploy groundcover's sensor to monitor containerized workloads, infrastructure, and applications with zero code changes
Monitor individual Linux servers, virtual machines, or cloud instances outside of Kubernetes.
Connect Linux hosts - Install groundcover on standalone Linux hosts such as EC2 instances, bare metal servers, or VMs
Gain visibility into frontend performance and user experience with client-side monitoring.
Connect RUM - Monitor real user interactions, page loads, and frontend performance in web applications
Integrate with existing observability tools and send data from your current monitoring stack.
Ship from OpenTelemetry - Forward traces, metrics, and logs from existing OpenTelemetry collectors
Ship from Datadog Agent - Send data from Datadog agents while maintaining your existing setup
Login and Create a Workspace - Set up your groundcover account and workspace
Review Requirements - Ensure your environment meets the necessary prerequisites
Choose your installation method - Select the option that matches your infrastructure setup
Follow the 5 quick steps - Get oriented with groundcover's interface and features
If you're unsure which installation method is right for you, or if you have specific requirements, check our FAQ or reach out to our support team.
Ingestion Keys- Ingestion Keys let sensors, integrations and browsers send observability data to your groundcover backend. These keys are the counterpart of API Keys, which are optimized for reading data or automating dashboards and monitors.
Datasources (ds) API Key- A key used to connect to groundcover as a datasource, querying Clickhouse and VictoriaMetrics directly.
Grafana Service Account Token- Used to remotely configure create Grafana Alerts & Dashboards via Terraform.
\

Displays metadata about the issue, including:
Monitor Name: Name of the Monitor that triggered the issue, including a link to it.
Description: Explains what the Monitor tracks and why it was triggered.
Severity: Shows the assigned severity level (e.g., S3).
Labels: Lists contextual labels like cluster, namespace, and workload.
Creation Time: Shows when the issue started firing.
Displays the Kubernetes events related to the selected issue within the timeframe selected in the Time Picker dropdown (upper right of the issue drawer).
When creating a Monitor using a traces query, the Traces tab will display the matching traces generated within the timeframe selected in the Time Picker dropdown (upper right of the issue drawer). Click on "View in Traces" to navigate to the Traces section with all relevant filters automatically applied.
When creating a monitor using a log query, the Logs tab will display the matching logs generated within the timeframe selected in the Time Picker dropdown (upper right of the issue drawer). Click on "View in Logs" to navigate to the Logs section with all relevant filters automatically applied.
A focused visualization of the interactions between workloads related to the selected issue.

This provides a guided approach to create a workflow. When creating a notification workflow, you will be asked to give your workflow a name, add filters, and select the specific integration to use.
Filter Rules By Labels - Add key-value attributes to ensure your workflow executes under specific conditions only - for example, env = prod only.
Delivery Destinations - Select one or more integrations to be used for notifications with this workflow.
Scope - When The Workflow Will Run - This setting allows you to limit this workflow execution only to monitors that explicitly select to route their triggers to this workflow, as opposed to "Handle all issues" that catches triggers from any monitor.
Once you create a workflow using this option, you can later edit the workflow to apply any configuration or logic by using the editor option (see next).
Clicking the button will open up a text editor where you can add your workflow definition in YAML format by applying any valid configuration, logic, and functionality.
Note: Make sure to create your integration prior to creating the workflow as it requires using an existing integration.
Upon successful workflow creation it will be active immediately, and a new workflow record will appear in the underlying table.
For each existing workflow, we can see the following fields:
Name: Your defined workflow name
Description: If you've added a description of the workflow
Creator: Workflow creator email
Creation Date: Workflow creation date
Last Execution Time: Timestamp of last workflow execution (depends on workflow trigger type)
Last Execution Status: Last execution status (failure or success)
From the right side of each workflow record in the display, you can access the menu (three dots) and click "Edit Workflow". This will open the editor so you can modify the YAML to conform to the available functionality. See examples below.

groundcover_container_memory_rss_bytes
current memory RSS (B)
groundcover_container_memory_request_bytes
K8s container memory request (B)
groundcover_container_memory_limit_bytes
K8s container memory limit (B)
groundcover_container_cpu_delay_seconds
K8s container CPU delay accounting in seconds
groundcover_container_disk_delay_seconds
K8s container disk delay accounting in seconds
groundcover_container_cpu_throttled_seconds_total
K8s container total CPU throttling in seconds
groundcover_node_free_disk_space
amount of free disk space in current node (B)
groundcover_node_total_disk_space
amount of total disk space in current node (B)
groundcover_node_used_percent_disk_space
percent of used disk space in current node (0-100)
groundcover_network_connections_opened_failed_total
Connections attempts failed per workload (including refused connections)
groundcover_network_connections_opened_refused_total
Connections attempts refused per workload











Authorization
Bearer <YOUR_API_KEY>
Your groundcover API key
Accept
*/*
Accept any response format
id
string
The UUID of the workflow to delete
Example for querying ClickHouse database using POST HTTP Request:
X-ClickHouse-Key (header): API Key you retrieved from the groundcover CLI. Replace ${API_KEY} with your actual API key, or set API_KEY as env parameter.
SELECT count() FROM traces WHERE start_timestamp > now() - interval '15 minutes' (data): The SQL query to execute. This query counts the number of traces where the start_timestamp is within the last 15 minutes.
Learn more about the ClickHouse query language here.
Example for querying the VictoriaMetrics database using the query_range API:
apikey (header): API Key you retrieved from the groundcover CLI. Replace ${API_KEY} with your actual API key, or set API_KEY as env parameter.
query (data): The promql query to execute. In this case, it calculates the sum of the rate of groundcover_resource_total_counter with the type set to http.
start (data): The start timestamp for the query range in Unix time (seconds since epoch). Example: 1715760000.
end (data): The end timestamp for the query range in Unix time (seconds since epoch). Example: 1715763600.
Learn more about the promql syntax here.
Learn more about VictoriaMetrics HTTP API here.
Service Account Token are only accessible once, so make sure you keep them somewhere safe, running the command again will generate a new service account token
Only groundcover tenant admins can generate Service Account Tokens
make sure you have Terraform installed
Use the official Grafana Terraform provider with the following attributes
Continue to create Alerts and Dashboards in Grafana, see: Build alerts & dashboards with Grafana Terraform provider.
You can read more about what you can achieve with the Grafana Terraform provider in the official docs
The following example demonstrates transformation of a log in a specific format to an event, while applying additional filtering and extraction logic.
In this example, we want to create events for when a user consistently fails to login to a system. We base it on logs with this specific format:
This pipeline will create events with the type multiple_login_failures for each time a user fails to login for the 5th time or more . It will store the username in .string_attributes and the attempt number in .float_attributes.
login failed for user <username>, attempt number <number>Example: This trigger ensures that only monitors fired with telemetry data from the Prod environment will actually execute the workflow. Note that the "env" attribute needs to be provided as a context label from the monitor:
Note: Workflows are "pull based" which means they will try to match monitors even when these monitors did not explicitly add a specific workflow. Therefore, the filters need to accurately define the condition to be used for a monitor.
Consts is a section where you can declare predefined attributes based on data provided with the monitor context. A set of functions is available to transform existing data and format it for propagation to third-party integrations. Consts simplify access to data that is needed in the actions section.
Example: The following example shows how to map a predefined set of severity values to the monitor severity as defined in groundcover - here, any potential severity in groundcover is translated into one of P1-P5 values.
The function keep.dictget gets a value from a map (dictionary) using a specific key. In case the key is not found - P3 will be the default severity:
Actions specify what happens when a workflow is triggered. Actions typically interface with external systems (like sending a Slack message). Actions can be an array of actions, they can be executed conditionally and include the integration in their config part as well as a payload block which is typically dependent on the exact integration used for the notification.
Actions include:
Provider part (provider:) - Configures the integration to be used
Payload part (with:) - Contains the data to submit to the integration based on its actual structure
Example: In this example you can see a typical Slack notification. Note that the actual integration is referenced through the 'providers' context attribute. The integration name is the exact string used to create the integration (in this case "groundcover-alerts").
status
Current status of the alert
firing - Active alert indicating an ongoing issue.
resolved - The issue has been resolved, and the alert is no longer active.
suppressed - Alert is suppressed.
pending - No Data or insufficient data to determine the alert state.
lastReceived
Timestamp when the alert was last received
This alert timestamp
firingStartTime
Start time of the firing alert
First timestamp of the current firing state.
source
Sources generating the alert
grafana
fingerprint
Unique fingerprint of the alert, this is a hash of the labels
02f5568d4c4b5b7f
alertname
Name of the monitor
Workload Pods Crashed Monitor
_gc_severity
The defined severity of the alert
S3, error
trigger
Trigger condition of the workflow
alert / manual / interval
values
A map containing two values that can be used:
The numeric value that triggered the alert (threshold_input_query) and the actual threshold that was defined for the alert (threshold_1)
"values": { "threshold_1": 0, "threshold_input_query": 99.507}
When crafting your workflows you can use any of the fields above using templating in any workflow field. Encapsulate you fields using double opening and closing curly brackets.
You can access label values by alert.labels.*
labels
Map of key:values derived from monitor definition.
{ "workload": "frontend",
"namespace": "prod" }
curl -L \
--request DELETE \
--url 'https://api.groundcover.com/api/workflows/{id}' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--header 'Accept: */*'{
"message": "OK"
}vector:
tracesPipeline:
extraSteps:
- name: filterHealthAPIs
transform:
type: filter
condition: |-
string!(.protocol_type) != "http" || !contains(string!(.resource_name), "/health")vector:
tracesPipeline:
extraSteps:
- name: obfuscateSecretServerResponses
transform:
type: remap
source: |-
if exists(.server) && string!(.server) == "my-secret-server" {
.response_body = "<REDACTED>"
.response_body_truncated = true
}vector:
tracesPipeline:
extraSteps:
- name: obfuscateBasedOnHeader
transform:
type: remap
source: |-
if exists(.http_request_headers.host) && string!(.http_request_headers.host) == "frontend" {
.request_body = "<REDACTED>"
.response_body = "<REDACTED>"
}curl 'https://ds.groundcover.com/' \
--header "X-ClickHouse-Key: ${API_KEY}" \
--data "SELECT count() from traces where start_timestamp > now() - interval '15 minutes' "curl 'https://ds.groundcover.com/datasources/prometheus/api/v1/query_range' \
--get \
--header "apikey: ${API_KEY}" \
--data 'query=sum(rate(groundcover_resource_total_counter{type="http"}))' \
--data 'start=1715760000' \
--data 'end=1715763600'groundcover versiongroundcover auth generate-service-account-tokenterraform {
required_providers {
grafana = {
source = "grafana/grafana"
}
}
}
provider "grafana" {
url = "https://app.groundcover.com/grafana"
auth = "{service account token}"
}vector:
eventsPipelines:
multiple_login_failures:
inputs:
- logs_from_logs
- json_logs
extraSteps:
- name: multiple_login_failures_filter
transform:
type: filter
condition: |
.container_name == "loginservice" && contains(string!(.content), "login failed for user")
- name: multiple_login_failures_extract
transform:
type: "remap"
source: |
regex_result = parse_regex!(string!(.content), r'login failed for user (?P<username>.*) attempt number (?P<attempt_number>[0-9.]+)')
if to_int!(regex_result.attempt_number) < 5 {
abort
}
.float_attributes = object!(.float_attributes)
.float_attributes.attempt_number = to_int!(regex_result.attempt_number)
.string_attributes = object!(.string_attributes)
.string_attributes.username = regex_result.usernametriggers:
- type: alert
filters:
- key: env
value: prodconsts:
severities: '{"S1": "P1","S2": "P2","S3": "P3","S4": "P4","critical": "P1","error": "P2","warning": "P3","info": "P4"}'
severity: keep.dictget({{ consts.severities }}, {{ alert.annotations._gc_severity }}, "P3")actions:
- name: slack-action-firing
provider:
config: '{{ providers.groundcover-alerts }}'
type: slack
with:
attachments:
- color: '{{ consts.red_color }}'
footer: '{{ consts.footer_url }}'
text: '{{ consts.slack_message }}'
title: 'Firing: {{ alert.alertname }}'
type: plain_text
message: ' 'message: "Pod Crashed - Pod: {{ alert.labels.pod_name }} Namespace: {{ alert.labels.namespace }}"Run terraform destroy to revert the changes
Supported architectures: AMD64 + ARM64
For the following providers, we will fetch the machine metadata from the provider's API.
AWS
✅
GCP
✅
Azure
✅
Linode
✅
Infrastructure Host metrics: CPU/Memory/Disk usage
Logs
Natively from docker containers running on the machine
JournalD (requires configuration)
Static log files on the machine ()
Traces
Natively from docker containers running on the machine
APM metrics and insights from the traces
Installation currently requires running a script on the machine.
The script will pull the latest sensor version and install it as a service: groundcover-sensor (requires privileges)
Where:
{ingestion_Key} - A dedicated ingestion key, you can generate or find existing ones from Settings -> Access -> Ingestion Keys
Ingestion Key needs to be of Type Sensor
{inCloud_Site} - Your backend ingress address (Your inCloud public ingestion endpoint)
{selected_Env} - The Environment that will group those machines on the cluster drop down in the top right corner (We recommend setting a separate one for non k8s deployments)
Check service status: systemctl status groundcover-sensor
View sensor logs: journalctl -u groundcover-sensor
The sensor supports overriding its default configuration by writing to the file is located in:
/etc/opt/groundcover/overrides.yaml.
After writing it you should restart the sensor service using:
systemctl restart groundcover-sensor
Example - override Docker max log line size:
Define what qualifies as an issue
Use filters in monitor definitions to include or exclude workloads, namespaces, HTTP status codes, clusters, and more tailor it to your context.
Silence issues with precision
Silence issues based on any label, such as status_code, cluster, or workload, to reduce noise and keep focus.
Clean, scoped issue view
Only see issues relevant to your environment, based on your configured monitors and silencing rules, no clutter.
Get alerted on new issues
Trigger alerts through your preferred integrations (Slack, PagerDuty, Webhooks, etc.) when a new issue is detected.
Define custom issues using all your data
Build monitors using metrics, traces, logs, and events, and correlate them to uncover complex problems.
Manage everything as code
Use Terraform to manage monitors and issues at scale, ensuring consistency and auditability.
Issue Source
Auto-created rules
User-defined monitors
Custom Filtering
❌
✅
Silencing by Labels
❌
✅
Alerts via Integrations
❌
All the built-in rules you’re used to are already available in the Monitors Catalog, you can add them all with a single click.
Adding the monitors in the "Started Pack" will match all the existing issues in the legacy page.
Head to:
Monitors → Create Monitor -> Monitor Catalog → Recommended Monitors
When signing in to groundcover for the first time, the platform automatically detects your organization based on the domain you used to sign in. If your organization already has existing workspaces available, the workspace selection screen will be displayed, where you can choose which of the existing workspaces you would like to join, or if you want to create a new workspace.
Available workspaces will be displayed only if either of the following applies:
You have been invited to join existing workspaces and haven't joined them yet
Someone has previously created a workspace that has auto-join enabled for the email domain that you used to sign in (applicable for corporate email domains only)
Click the Join button next to the desired workspace
You will be added as a user to that workspace with the user privileges that were assigned by default or those that were assigned to you specifically when the invite was sent.
You will automatically be redirected to that workspace.
Click the Create a new workspace button
Specify a workspace name
Choose whether to enable auto-join (those settings can be changed later)
Click continue
Workspace owners and admins can allow teammates that log in with the same email domain as them to join the Workspace they created automatically, without an admin approval. This capability is called "Auto-join". It is disabled by default, but can be switched on during the workspace set up process, or any time in the workspace settings.
If you logged in with a public email domain (Gmail, Yahoo, Proton, etc.) and are creating a new Workspace, you will not be able to switch on Auto-join for that Workspace.
Only Admins can create or revoke API keys. To create an API key:
Navigate to the Settings page using the settings button located in the bottom left corner
Select "Access" from the sidebar menu
Click on the "API Keys" tab
Create a new API Key, ensuring you assign it to a service account that is bound to the appropriate RBAC policy
When a key is created, its value is shown once—store it securely in a secret manager or encrypted environment variable. If lost, a new key must be issued.
To use an API key, send it in the Authorization header as bearer token:
The key authenticates as the service account, and all API permissions are enforced accordingly.
API Key authentication will work using https://api.groundcover.com/ only.
API keys do not expire automatically. Revoking a key immediately disables its access.
API keys are valid only for requests to https://api.groundcover.com. They do not support data ingestion or Grafana integration—those require dedicated tokens.
Ingestion Key
API Key
Primary purpose
Write data (ingest)
Read data / manage resources via REST
Permissions capabilities
Write‑only + optional remote‑config read
Mirrors service‑account RBAC
Visibility after creation
Always revealable
Shown once only
Typical lifetime
Tied to integration lifecycle
Rotated for CI/CD automations
Store securely: Use secrets managers like AWS Secrets Manager or HashiCorp Vault. Never commit keys to source control.
Follow least privilege: Assign the minimal required policies to service accounts and API keys. Avoid defaulting to admin-level access.
Rotate regularly: Periodically generate new keys, update your systems, and revoke old ones to limit exposure.
Revoke stale keys: Remove keys that are no longer in use or suspected to be compromised.
Prompt:
Expected behavior:
The agent should call query_logs and show recent logs for that workload.
Prompt:
Expected behavior:
The agent should call get_k8s_object_yaml and return the YAML or a summary of it.
Prompt:
Expected behavior:
The agent should call get_workloads and return the relevant workloads with their P95 latency.
When something breaks, your agent can help investigate and make sense of it.
Prompt:
Expected behavior:
The agent should use query_monitors_issues, pull issue details, and kick off a relevant investigation using logs, traces, and metadata.
Prompt:
Expected behavior:
The agent should use query_monitors_issues to pull all related issues and start going through them one by one.
groundcover’s MCP can also be your coding sidekick. Instead of digging through tests and logs manually, deploy your changes and let the agent take over.
Prompt:
Expected behavior:
The agent should update the code with log statements, deploy it, and use query_logs to trace and debug.
Prompt:
Expected behavior: The agent should assist with deployment, then check for issues, error logs, and traces via groundcover.
groundcover supports any K8s version from v1.21.
For the installation to complete successfully, permissions to deploy the following objects are required:
StatefulSet
Deployment
DaemonSet (With privileged containers for loading our )
ConfigMap
To learn more about groundcover's architecture and components visit our
groundcover's portal pod sends HTTP requests to the cloud platform app.groundcover.com on port 443.
This unique keeps the the data inside the cluster and fetches it on-demand keeping the data encrypted all the way without the need to open the cluster for incoming traffic via ingresses.
Set up your agent to talk to groundcover’s MCP server. Use OAuth for a quick login, or an API key for service accounts.
The MCP server supports two methods:
OAuth (Recommended for IDEs)
OAuth is the default if your agent supports it.
To get started, add the config below to your MCP client. Once you run it, your browser will open and prompt you to log in with your groundcover credentials.
Please make sure to have installed (to use npx)
Configuration Example
Cursor
Claude Code
If your agent doesn’t support OAuth, or if you want to connect a service account, this is the way to go.
Service‑account API key – create one or use an existing API Key. Learn more at .
Your local time zone (IANA format, for example America/New_York or Asia/Jerusalem). See how to .
Parameters
AUTH_HEADER – your groundcover API key.
TIMEZONE – your time zone in IANA format.
If you're using a multi-backend setup (OAuth or API Key), just add the following header to the args list:
First, grab your backend ID (it’s basically the name):
Open Data Explorer in groundcover.
Click the Backend picker (top‑right) and copy the backend’s name.
Depending on your client, you can usually set up the MCP server through the UI - or just ask the client to add it for you. Here are quick links for common tools:
groundcover’s eBPF sensor uses state-of-the-art kernel features to provide full coverage at low overhead. In order to do so it requires certain kernel features which are listed below.
Version v5.3 or higher (anything since 2020).
Loading eBPF code requires running privileged containers. While this might seem unusual, there's nothing to worry about - eBPF is
Our sensor uses eBPF’s feature in order to support the vast variety of linux kernels and distributions detailed above. This feature requires the kernel to be compiled with BTF information (enabled using the CONFIG_BTF_ENABLE=Y kernel compilation flag). This is the case for most common nowadays.
You can check if your kernel has CO:RE support by manually looking for the BTF file:
If the file exists, congratulations! Your kernel supported CO:RE.
If your system does not fit into any of the above - unfortunately, our eBPF sensor will not be able to run on your environment. However, this does not mean groundcover won’t collect any data. You will still be able to inspect your , see all and use with outer data sources.
This guide shows how to route groundcover alerts to email using Zapier. Since groundcover supports webhook-based alerting, and Zapier can receive webhooks and send emails, you can easily set up this workflow without writing code.
A groundcover account with access to the Workflows tab.
A Zapier account (free plan is sufficient).
An email address where you want to receive alerts.
Go to Settings → Integrations.
Click Create Integration.
Choose Webhook as the type.
Enter a name like zapier_email_integration.
Go to .
Click "Create Zap".
Set Trigger:
App: Webhooks by Zapier
Set Action:
App: Email by Zapier
Event: Send Outbound Email
Go to the Workflows section in your groundcover.
Create a Notification Workflow with the integration we created in step 1.
Edit the worflow YAML and use the following structure:
Trigger a test alert in groundcover.
Check Zapier to ensure the webhook was received.
Confirm the email arrives with the right content.
Retrieve a list of Kubernetes namespaces within a specified time range.
This endpoint requires API Key authentication via the Authorization header.
Format: ISO 8601 format with milliseconds: YYYY-MM-DDTHH:mm:ss.sssZ
Timezone: All timestamps must be in UTC (denoted by 'Z' suffix)
The response contains an array of namespaces for the specified time period.
Quickly understand what requires your attention and drive your investigations
The issues page is a useful place to start a troubleshooting or investigation flow from. It gathers together all active issues found in your Kubernetes environment.
HTTP / gRPC Failures Capturing failed HTTP calls with Response Status Codes of:
5XX — Internal Server Error
429 — Too Many Requests
MySQL / PostgreSQL Failures
Capturing failed SQL statement executions with Response Errors Codes such as:
1146 — No Such Table
Redis Failures Capturing any reported Error by the Redis serialization protocol (RESP), such as:
ERR unknown command\
Container Restarts Capturing all container restart events across the cluster, with Exit Codes such as:
0 — Completed
137 — OOMKilled\
Deployment Failures
Capturing events such as:
MinimumReplicasUnavailable — Deployment does not have minimum availabiltiy
Issues are auto-detected and aggregated - representing many identical repeating incidents. Aggregation help cutting through the noise quickly and reach insights like when a new type of issue started to appear, and when it was last seen.
Issues are grouped by:
Type (HTTP, gRPC, Container Restart, etc..)
Status Code / Error Code (e.g HTTP 500, gRPC 13)
Workload name
The smart aggregation mechanism will also identify query parameters, remove them, and group the stripped queries / API URIs into patterns. This allows users to easily identify and isolate the root cause of a problem.
Each issue is assigned a velocity graph showing it's behavior over time (like when it was first seen) and a live counter of its number of incidents.
By clicking on an issue, users can access the specific traces captured around the relevant issue. Each trace is related to the exact resource that was used (e.g. raw API URI, or SQL query), it's latency and Status Code / Error Code.
Further clicking on a selected captured trace allows the user to investigate the root cause of the issue with the entire payload (body and headers) of the request and response, the information around the participating container, the application logs around incident's time and the full context of the metrics around the incident.
Below are the essentials relevant to writing remap transforms in groundcover. Extended information can be found in Vector's documentation.
We support using all types of Vector transforms as pipeline steps.
For testing VRL before deployment we recommend the VRL playground.
When processing Vector events, fields names need to be prefixed by . , a single period. For example, the content field in a log, representing the body of the log, is accessible using .content.
Specifically in groundcover, attributes parsed from logs or associated with traces will be stored under the string_attributes for string values, and under float_attributes for numerical values. Accessing attributes is possible by adding additional . as needed. For example, a JSON log that looks like this:
Will be translated into an event with the following attributes:
Each of Vector's built-in function can be either fallible or infallible. Fallible functions can throw an error when called, and require error handling, whereas infallible functions will never throw an error.
When writing Vector transforms in VRL it's important to use error handling where needed. Below are the two ways error handling in Vector is possible - see more on .
VRL code without proper error handling will throw an error during compilation, resulting in error logs in the Vector deployment.
Let's take a look at the following code.
The code above can either succeed in parsing the json, or fail in parsing it. The err variable will contain indication of the result status, and we can proceed accordingly.
Let's take a look at this slightly different version of the code above:
This time there's no error handling around, but ! was added after the function call.
This method of error handling is called abort on error - it will fail the transform entirely if the function returns an error, and proceed normally otherwise.
Both methods above are valid VRL for handling errors, and you must choose one or the other when handling fallible functions. However, they carry one big difference in terms of pipelines in groundcover:
Transforms which use option #1 (error handling) will not stop the pipeline in case of error - the following steps will continue to execute normally. This is useful when writing optional enrichment steps that could potentially fail with no issue.
Transforms which use option #2 (abort) will stop the pipeline in case of error - the event will not proceed to the other steps. This is mostly useful for mandatory steps which can't fail no matter what.
Update the logs pipeline configuration.
POST /api/pipelines/logs/config
This endpoint requires API Key authentication via the Authorization header.
Update logs pipeline configuration with test pattern:
🚨 CRITICAL WARNING: This endpoint COMPLETELY REPLACES the entire pipeline configuration and WILL DELETE ALL EXISTING RULES. Always backup your current configuration first by calling the GET endpoint.
ALWAYS get your current configuration before making changes:
After updating the configuration, verify the patterns were added:
This should return your updated configuration including the new test patterns.
For detailed information about configuring and writing OTTL transformations, see:
Retrieve the current logs pipeline configuration.
GET /api/pipelines/logs/config
This endpoint requires API Key authentication via the Authorization header.
Get current logs pipeline configuration:
For detailed information about configuring and writing OTTL transformations, see:
Use Terraform to create, update, delete, and list groundcover dashboards as code. Managing dashboards with infrastructure‑as‑code (IaC) lets you version changes, review them in pull requests, promote the same definitions across environments, and detect drift between what’s applied and what’s running in your account.
Secure, write‑focused credentials for streaming data into groundcover
To integrate groundcover with MS Teams, follow the steps below. Note that you’ll need at least a Business subscription of MS Teams to be able to create workflows.
Create a webhook workflow for your dedicated Teams channel Go to Relevant Team -> Specific Channel -> "Workflows", and create a webhook workflow
The webhook workflow is associated a URL which is used to trigger the MS Teams integration on groundcover - make sure to copy this URL
Set Up the Webhook in groundcover
groundcover’s Real User Monitoring (RUM) SDK allows you to capture front-end performance data, user events, and errors from your web applications.
Start capturing RUM data by installing the in your web app.
This guide will walk you through installing the SDK, initializing it, identifying users, sending custom events, capturing exceptions, and configuring optional settings.
To integrate groundcover with , follow the steps below. Note that you’ll need a Pro incident.io account to view your incoming alerts.
Generate an Alerts configuration for groundcover Log in to your account. Go to "On-call" -> "Alerts" -> "Configure" and add a new source.
On the "Create Alert Source" screen the answer to the question "Where do your alerts come from?" should be "HTTP". Select this source and give it a unique name. Hit "continue".
groundcover supports sending notifications to Slack using a Slack App with bot tokens instead of static webhooks. This method allows dynamic routing of alerts to any channel by including the channel ID in the payload. In addition to routing, messages can be enriched with formatting, blocks, and mentions — for example including <@user_id> in the payload to directly notify specific team members. This provides a flexible and powerful alternative to fixed incoming webhooks for alerting.
Make sure you created a .
Use the following workflow as an example. You can later enrich your workflow with additional functionality.
Here are a few tips for using the example workflow:
In the consts section, the channels attribute defines the mapping between Slack channels and their IDs. Use a clear, readable label to identify each channel (for example, the channel’s actual name in Slack), and map it to the corresponding channel ID.
Terraform is an infrastructure-as-code (IaC) tool for managing cloud and SaaS resources using declarative configuration. The groundcover Terraform provider enables you to manage observability resources such as policies, service accounts, API keys, and monitors as code—making them consistent, versioned, and automated.
We've partnered up with Hashicorp as an official Terraform provider-
Also available is our provider Github repository:
Exposing Data Sources for Managed inCloud Setup
Save the view of any groundcover page exactly the way you like it, then jump back in a click.
A Saved View captures your current page layout: filters, columns, toggles, etc.. so you and your team can reopen the page with the same context every time. Each groundcover page maintains its own catalogue of views, and every user can pick their personal Favorites.
On the pages: Traces, Logs, API Catalog, Events.
Look for the Views selector next to the time‑picker. Click it to open the list, create a new view, or switch between existing ones.
groundcover supports the configuration of logs and traces pipelines, to further process and customize the data being collected, using transforms. This enables full flexibility to manipulate the data as it flows into the platform.
See for more information about how Vector is being used in the groundcover platform's architecture.
mkdir groundcover-tf-example && cd groundcover-tf-exampleresource "grafana_folder" "goldensignals" {
title = "Golden Signals"
}
resource "grafana_dashboard" "workloadgoldensignals" {
config_json = file("workloadgoldensignals.json")
folder = grafana_folder.goldensignals.id
}curl -fsSL https://groundcover.com/install-groundcover-sensor.sh | sudo env API_KEY='{ingestion_Key}' GC_ENV_NAME='{selected_Env}' GC_DOMAIN='{inCloud_Site}' bash -s -- installcurl -fsSL https://groundcover.com/install-groundcover-sensor.sh | sudo bash -s -- uninstallecho "# Local overrides to sensor configuration
k8sLogs:
scraper:
dockerMaxLogSize: 102400
" | sudo tee /etc/opt/groundcover/overrides.yaml && sudo systemctl restart groundcover-sensorcurl 'https://api.groundcover.com/api/k8s/v3/clusters/list' \
-H 'accept: application/json' \
-H 'authorization: Bearer <YOUR_API_KEY>' \
-H 'X-Backend-Id: <YOUR_BACKEND_ID>' \
--data-raw '{"sources":[]}'Use groundcover to get 5 logs from the workload news-service from the past 15 minutes.Use groundcover to get the spec of the chat-app deployment.Use groundcover to show the top 5 workloads by P95 latency.I got an alert for this critical groundcover issue. Can you investigate it?
https://app.groundcover.com/monitors/issues?...I got multiple alerts in the staging-env namespace. Can you help me look into them using groundcover?Use groundcover to debug this code. For each test, print relevant logs with test_id, and dive into any error logs.Please deploy this service and verify everything works using groundcover.POST /api/k8s/v2/namespaces/list✅
Terraform Support
❌
✅
Issues Based on Traces/Logs/Metrics
Limited
Full support
Revocation effect
Data stops flowing immediately
API calls fail
The groundcover tenant API KEY is required for configuring the data source connection.
You can obtain your API key from the API Keys page in the groundcover console.
For this example we will use the key
API-KEY-VALUE
Configure Grafana prometheus Data Source by following these steps logged in as Grafana Admin.
Connections > Data Sources > + Add new data source
Pick Prometheus
Name: groundcover-prometheus
Prometheus server URL: https://app.groundcover.com/api/prometheus
Customer HTTP Headers > Add Header
Header: authorization
Value: Bearer API-KEY-VALUE
Performance
Prometheus type: Prometheus
Prometheus version: > 2.50.x
Click "Save & test"
The following configurations are deprecated but may still be in use in older setups.
The legacy datasources API key can be obtained by running: groundcover auth get-datasources-api-key
ClickHouse datasource integration is deprecated and no longer supported for new installations.
Configure Grafana ClickHouse Data Source by following these steps logged in as Grafana Admin.
Connections > Data Sources > + Add new data source
Pick ClickHouse
Name: groundcover-clickhouse
Server
Server address: ds.groundcover.com
Server port: 443
Protocol: HTTP
Secure Connection: ON
HTTP Headers
Forward Grafana HTTP Headers: ON
Credentials
Username: Leave empty
Password: API-KEY-VALUE
Additional Properties
Default database: groundcover
Click "Save & test"
1064 — Syntax Error\







OpenShift
supported
Rancher
supported
Self-managed
supported
minikube
supported
kind
supported
Rancher Desktop
supported
k0s
supported
k3s
supported
k3d
supported
microk8s
supported
AWS Fargate
not supported
Docker-desktop
not supported
Secret
PVC
EKS
supported
AKS
supported
GKE
supported
OKE
supported
macOS
sudo systemsetup -gettimezone
Linux
timedatectl grep "Time zone"
Windows PowerShell
Get-TimeZone

Amazon Linux
All off the shelf AMIs
Google COS
All off the shelf AMIs
Azure Linux
All off the shelf AMIs
Talos
1.7.3+
Debian
11+
RedHat Enterprise Linux
8.2+
Ubuntu
20.10+
CentOS
7.3+
Fedora
31+
BottlerocketOS
1.10+
Paste your Zapier Catch Hook URL (you’ll get this in Step 2 below).
Save the integration.
Event: Catch Hook
Copy the Webhook URL (e.g. https://hooks.zapier.com/hooks/catch/...) – you'll use this in groundcover.
To: your email address
Subject:
Body:

Authorization
Yes
Bearer token with your API key
X-Backend-Id
Yes
Your backend identifier
Content-Type
Yes
Must be application/json
Accept
Yes
Must be application/json
sources
Array
No
Filter by data sources (empty array for all sources)
start
String
Yes
Start timestamp in ISO 8601 format (UTC)
end
String
Yes
namespaces
Array
Array of namespace names or namespace objects
End timestamp in ISO 8601 format (UTC)
This endpoint requires API Key authentication via the Authorization header.
name
string
Yes
Exact name of the ingestion key to delete
The endpoint returns an empty response body with HTTP 200 status code when the key is deleted successfully.
🚨 PERMANENT DELETION: This operation permanently deletes the ingestion key and cannot be undone.
⚠️ Immediate Impact: Any services using this key will:
Receive 403 PERMISSION_DENIED errors
Stop sending data to groundcover immediately
Lose access to remote configuration (for sensor keys)
To verify the key was deleted, use the List Ingestion Keys endpoint:
This should return an empty array [] if the key was successfully deleted.
Identify the key to delete using the List endpoint
Update integrations to use a different key first
Test integrations work with the new key
Delete the old key using this endpoint
Verify deletion using the List endpoint
Always have a replacement key ready before deleting production keys
Test your rollover plan in staging environments first
Update all services using the key before deletion
Use descriptive names to avoid accidental deletion of wrong keys
Consider key rotation instead of permanent deletion for security incidents
For comprehensive information about ingestion keys management, see:
Head out to the integrations section: Settings -> Integrations, to create a new Webhook
Start by giving your Webhook integration a name. This name will be used below in the provider block sample .
Set the Webhook URL to the url you copied from field (2)
Keep the HTTP method as POST
Create a Workflow
Go to Monitors --> Workflows --> Create Workflow, and paste the YAML configuration provided below.
Configure the provider Blocks (There are two of them)
In the provider block, replace {{ providers.your-teams-integration-name }} with your actual Webhook integration name (the one you created in step 3)
For example, if you named your integration test-ms-teams, the config reference would be: {{ providers.test-ms-teams }}
Sample code for your groundcover workflow:
apiKey
A dedicated Ingestion Key of type RUM (Settings -> Access -> Ingestion Keys)
dsn
Your public groundcover endpoint in the format of https://example.platform.grcv.io , where example.platform.grcv.io
is your ingress.site installation value.
cluster
Identifier for your cluster; helps filter RUM data by specific cluster.
environment
Environment label (e.g., production, staging) used for filtering data.
appId
Custom application identifier set by you; useful for filtering and segmenting data on a single application level later.
You can customize SDK behavior (event sampling, data masking, enabled events). The following properties are customizable:
You can pass the values by calling the init function:
Or via the updateConfig function:
Link RUM data to specific users:
Instrument key user interactions:
Manually track caught errors:
port-forward groundcover's VictoriaMetrics service object
Run the vmbackup utility, in this example we'll set the destination to an AWS S3 bucket, but more providers are supported
Scale down VictoriaMetrics statefulSet (VictoriaMetrics must be offline during restorations)
Get the VictoriaMetrics PVC name
Create the following Kubernetes Job manifest vm-restore.yaml
Deploy the job and wait for completion
Once completed, scale up groundcover's VictoriaMetrics instance
groundcover uses Vector as an aggregator and transformer deployed into each monitored environment. It is an open-source, highly performant service, capable of supporting many manipulations on the data flowing into groundcover's backend.
Pipelines are configured using Vector transforms, where each transform defines one step in the pipeline. There are many types of transforms, and all of them can be natively used within the groundcover deployment to achieve full flexibility.
groundcover's deployment supports adding a list of transforms for logs and traces independently. These steps will be automatically appended into the default pipeline, eliminating the need to understand the inner workings of grouncover's setup. Instead, you only need to configure the steps you wish to execute, and after redeploying groundcover you will see them take effect immediately.
Each step requires two attributes:
name: must be unique across all pipelines
transform: the transform itself, passed as-is to Vector.
The following is a template for a logs pipeline with two remap stages:
The following is a template for a traces pipeline with one filter stage:
Logs to Events pipelines allow creating of custom events from incoming logs. Unlike the logs and traces pipelines, they do not affect the original logs, and are meant to create parallel, distinguished events for future analytics.
The following is a template for a custom event pipeline with a filter stage and an extraction step.
{
"mcpServers": {
"groundcover": {
"command": "npx",
"args": [
"-y",
"[email protected]",
"https://mcp.groundcover.com/api/mcp",
"54278",
"--header",
"X-Timezone:<IANA_TIMEZONE>",
"--header",
"X-Tenant-Uuid:<TENANT_UUID>"
]
}
}
}claude mcp add groundcover npx -- -y [email protected] https://mcp.groundcover.com/api/mcp 54278 --header X-Timezone:<IANA_TIMEZONE> --header X-Tenant-UUID:<TENANT_UUID>{
"mcpServers": {
"groundcover": {
"command": "npx",
"args": [
"-y",
"mcp-remote",
"https://mcp.groundcover.com/api/mcp",
"--header", "Authorization:${AUTH_HEADER}",
"--header", "X-Timezone:${TIMEZONE}"
],
"env": {
"AUTH_HEADER": "Bearer <your_token>",
"TIMEZONE": "<your_timezone>"
}
}
}
}"--header", "X-Backend-Id:<BACKEND_ID>"$ ls -la /sys/kernel/btf/vmlinux
- r--r--r--. 1 root root 3541561 Jun 2 18:16 /sys/kernel/btf/vmlinux🚨 New groundcover Alert 🚨🔔 Alert Title: {{alert_name}}
💼 Severity: {{severity}}
🔗 Links:
- 🧹 Issue: {{issue_url}}
- 📈 Monitor: {{monitor_url}}
- 🔕 Silence: {{silence_url}}workflow:
id: emails
description: Sends alerts to Zapier webhook (for email)
triggers:
- type: alert
name: emails
consts:
severity: keep.dictget( {{ alert.annotations }}, "_gc_severity", "info")
title: keep.dictget( {{ alert.annotations }}, "_gc_issue_header", "{{ alert.alertname }}")
issue_url: https://app.groundcover.com/monitors/issues?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.fingerprint }}
monitor_url: https://app.groundcover.com/monitors?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.labels._gc_monitor_id }}
silence_url: https://app.groundcover.com/monitors/create-silence?keep.replace(keep.join({{ consts.redacted_labels }}, "&", "matcher_"), " ", "+")
redacted_labels: keep.dict_pop({{ alert.labels }}, "_gc_monitor_id", "_gc_monitor_name", "_gc_severity", "backend_id", "grafana_folder", "_gc_issue_header")
actions:
- name: <<THE_NAME_OF_YOUR_WORFLOW>>
provider:
config: "{{ providers.<<THE_NAME_OF_YOUR_INTEGRATION>> }}"
type: webhook
with:
body:
alert_name: "{{ consts.title }}"
severity: "{{ consts.severity }}"
issue_url: "{{ consts.issue_url }}"
monitor_url: "{{ consts.monitor_url }}"
silence_url: "{{ consts.silence_url }}"curl 'https://api.groundcover.com/api/k8s/v2/namespaces/list' \
-H 'accept: application/json' \
-H 'authorization: Bearer <YOUR_API_KEY>' \
-H 'content-type: application/json' \
-H 'X-Backend-Id: <YOUR_BACKEND_ID>' \
--data-raw '{"sources":[],"start":"2025-01-24T06:00:00.000Z","end":"2025-01-24T08:00:00.000Z"}'{
"namespaces": [
"groundcover",
"monitoring",
"kube-system",
"default"
]
}# Get current time and subtract 24 hours for start time
start_time=$(date -u -v-24H '+%Y-%m-%dT%H:%M:%S.000Z')
end_time=$(date -u '+%Y-%m-%dT%H:%M:%S.000Z')
curl 'https://api.groundcover.com/api/k8s/v2/namespaces/list' \
-H 'accept: application/json' \
-H 'authorization: Bearer <YOUR_API_KEY>' \
-H 'content-type: application/json' \
-H 'X-Backend-Id: <YOUR_BACKEND_ID>' \
--data-raw "{\"sources\":[],\"start\":\"$start_time\",\"end\":\"$end_time\"}"{"name":"my-log","count":5} .string_attributes.name --> "my-log"
.float_attributes.count --> 5.parsed, .err = parse_json("{\"Hello\": \"World!\"}")
if err == null {
// do something with .parsed
}parsed = parse_json!("{\"Hello\": \"World!\"}")Authorization: Bearer <YOUR_API_KEY>
Content-Type: application/jsoncurl -L \
--request POST \
--url 'https://api.groundcover.com/api/pipelines/logs/config' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--header 'Content-Type: application/json' \
--data '{
"ottlRules": [
{
"ruleName": "test_log_pattern",
"conditions": [
"workload == \"test-app\" or container_name == \"test-container\""
],
"statements": [
"set(cache, ExtractGrokPatterns(body, \"^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:message}\"))",
"merge_maps(attributes, cache, \"insert\")",
"set(attributes[\"parsed\"], true)"
],
"statementsErrorMode": "skip",
"conditionLogicOperator": "or"
},
{
"ruleName": "json_parsing_test",
"conditions": [
"format == \"JSON\""
],
"statements": [
"set(parsed_json, ParseJSON(body))",
"merge_maps(attributes, parsed_json, \"insert\")"
],
"statementsErrorMode": "skip",
"conditionLogicOperator": "and"
}
]
}'{
"uuid": "59804867-6211-48ed-b34a-1fc33827aca6",
"created_by": "itamar",
"created_timestamp": "2025-08-31T13:33:27.364525Z",
"value": "ottlRules:\n - ruleName: test_log_pattern\n conditions:\n - workload == \"test-app\" or container_name == \"test-container\"\n statements:\n - set(cache, ExtractGrokPatterns(body, \"^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:message}\"))\n - merge_maps(attributes, cache, \"insert\")\n - set(attributes[\"parsed\"], true)\n statementsErrorMode: skip\n conditionLogicOperator: or"
}curl -L \
--url 'https://api.groundcover.com/api/pipelines/logs/config' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--header 'Accept: */*' > pipeline-backup.jsoncurl -L \
--url 'https://api.groundcover.com/api/pipelines/logs/config' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--header 'Accept: */*'Authorization: Bearer <YOUR_API_KEY>
Accept: */*curl -L \
--url 'https://api.groundcover.com/api/pipelines/logs/config' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--header 'Accept: */*'{
"ottlRules": [
{
"ruleName": "nginx_access_logs",
"conditions": [
"workload == \"nginx\" or container_name == \"nginx\""
],
"statements": [
"set(cache, ExtractGrokPatterns(body, \"^%{IPORHOST:remote_ip} - %{DATA:remote_user} \\[%{HTTPDATE:timestamp}\\] \\\"%{WORD:method} %{DATA:path} HTTP/%{NUMBER:http_version}\\\" %{INT:status} %{INT:body_bytes}\"))",
"merge_maps(attributes, cache, \"insert\")"
],
"statementsErrorMode": "skip",
"conditionLogicOperator": "or"
},
{
"ruleName": "json_log_parsing",
"conditions": [
"format == \"JSON\""
],
"statements": [
"set(parsed_json, ParseJSON(body))",
"merge_maps(attributes, parsed_json, \"insert\")"
],
"statementsErrorMode": "skip",
"conditionLogicOperator": "and"
},
{
"ruleName": "error_log_enrichment",
"conditions": [
"level == \"error\" or level == \"ERROR\""
],
"statements": [
"set(attributes[\"severity\"], \"high\")",
"set(attributes[\"needs_attention\"], true)"
],
"statementsErrorMode": "skip",
"conditionLogicOperator": "or"
}
]
}Authorization: Bearer <YOUR_API_KEY>
Content-Type: application/json{
"name": "string"
}curl -L \
--request DELETE \
--url 'https://api.groundcover.com/api/rbac/ingestion-keys/delete' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--header 'Content-Type: application/json' \
--data '{
"name": "old-test-key"
}'curl -L \
--request POST \
--url 'https://api.groundcover.com/api/rbac/ingestion-keys/list' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--header 'Content-Type: application/json' \
--data '{
"name": "old-test-key"
}'workflow:
id: teams-webhook
description: Sends an API to MS Teams alerts endpoint
name: ms-teams-alerts-workflow
triggers:
- type: alert
filters:
- key: annotations.ms-teams-alerts-workflow
value: enabled
consts:
silence_link: 'https://app.groundcover.com/monitors/create-silence?keep.replace(keep.join(keep.dict_pop({{ alert.labels }}, "_gc_monitor_id", "_gc_monitor_name", "_gc_severity", "backend_id", "grafana_folder"), "&", "matcher_"), " ", "+")'
monitor_link: 'https://app.groundcover.com/monitors?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.labels._gc_monitor_id }}'
title_link: 'https://app.groundcover.com/monitors/issues?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.fingerprint }}'
description: keep.dictget( {{ alert.annotations }}, "_gc_description", '')
redacted_labels: keep.join(keep.dict_pop({{alert.labels}}, "_gc_monitor_id", "_gc_monitor_name", "_gc_severity", "backend_id", "grafana_folder", "_gc_issue_header"), "-\n")
title: keep.dictget( {{ alert.annotations }}, "_gc_issue_header", "{{ alert.alertname }}")
actions:
- if: '{{ alert.status }} == "firing"'
name: teams-webhook-firing
provider:
config: ' {{ providers.your-teams-integration-name }} '
type: webhook
with:
body:
type: message
attachments:
- contentType: application/vnd.microsoft.card.adaptive
content:
$schema: http://adaptivecards.io/schemas/adaptive-card.json
type: AdaptiveCard
version: "1.2"
body:
- type: TextBlock
text: "\U0001F6A8 Firing: {{ consts.title }}"
weight: bolder
size: large
- type: TextBlock
text: "[Investigate Issue]({{consts.title_link}})"
wrap: true
- type: TextBlock
text: "{{ consts.description }}"
wrap: true
- type: TextBlock
text: "[Silence]({{consts.silence_link}})"
wrap: true
- type: TextBlock
text: "[See monitor]({{consts.monitor_link}})"
wrap: true
- type: TextBlock
text: "{{ consts.redacted_labels }}"
wrap: true
- if: '{{ alert.status }} != "firing"'
name: teams-webhook-resolved
provider:
config: ' {{ providers.your-teams-integration-name }} '
type: webhook
with:
body:
type: message
attachments:
- contentType: application/vnd.microsoft.card.adaptive
content:
$schema: http://adaptivecards.io/schemas/adaptive-card.json
type: AdaptiveCard
version: "1.2"
body:
- type: TextBlock
text: "\U0001F7E2 Resolved: {{ consts.title }}"
weight: bolder
size: large
- type: TextBlock
text: "[Investigate Issue]({{consts.title_link}})"
wrap: true
- type: TextBlock
text: "{{ consts.description }}"
wrap: true
- type: TextBlock
text: "[Silence]({{consts.silence_link}})"
wrap: true
- type: TextBlock
text: "[See monitor]({{consts.monitor_link}})"
wrap: true
- type: TextBlock
text: "{{ consts.redacted_labels }}"
wrap: true
npm install @groundcover/browser
# or
yarn add @groundcover/browserimport groundcover from '@groundcover/browser';
groundcover.init({
apiKey: 'your-ingestion-key',
cluster: 'your-cluster',
environment: 'production',
dsn: 'your-dsn',
appId: 'your-app-id',
});export interface SDKOptions {
batchSize: number;
batchTimeout: number;
eventSampleRate: number;
sessionSampleRate: number;
environment: string;
debug: boolean;
tracePropagationUrls: string[];
beforeSend: (event: Event) => boolean;
enabledEvents: Array<"dom" | "network" | "exceptions" | "logs" | "pageload" | "navigation" | "performance">;
excludedUrls: [];
}groundcover.init({
apiKey: 'your-ingestion-key',
cluster: 'your-cluster',
environment: 'production',
dsn: 'your-dsn',
appId: 'your-app-id',
options: {
batchSize: 50,
sessionSampleRate: 0.5, // 50% sessions sampled
eventsSampleRate: 0.5,
},
});groundcover.updateConfig({
batchSize: 20,
});groundcover.identifyUser({
id: 'user-id',
email: '[email protected]',
});groundcover.sendCustomEvent({
event: 'PurchaseCompleted',
attributes: { orderId: 1234, amount: 99.99 },
});try {
performAction();
} catch (error) {
groundcover.captureException(error);
}kubectl get svc -n groundcover | grep "victoria-metrics"
# Identify the victoria-metrics service object name
kubectl port-forward svc/{victoria-metrics-service-object-name} \
-n groundcover 8428:8428./vmbackup -credsFilePath={aws credentials path} \
-storageDataPath=</path/to/victoria-metrics-data> \
-snapshot.createURL=http://localhost:8428/snapshot/create \
-dst=s3://<bucket>/<path/to/backup>kubectl scale sts {release name}-victoria-metrics --replicas=0kubectl get pvc -n groundcover | grep victoria-metricsapiVersion: v1
kind: ServiceAccount
metadata:
name: vm-restore
annotations:
eks.amazonaws.com/role-arn: XXXXX # role with permissions to write to the bucket
---
apiVersion: batch/v1
kind: Job
metadata:
name: vm-restore
spec:
ttlSecondsAfterFinished: 600
template:
spec:
serviceAccountName: vm-restore
restartPolicy: OnFailure
volumes:
- name: vmstorage-volume
persistentVolumeClaim:
claimName: "{VICTORIA METRICS PVC NAME}"
containers:
- name: vm-restore
image: victoriametrics/vmrestore
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /storage
name: vmstorage-volume
command:
- /bin/sh
- -c
- /vmrestore-prod -src=s3://<bucket>/<path/to/backup> -storageDataPath=/storagekubectl apply -f vm-restore.yaml -n groundcoverkubectl scale sts {release name}-victoria-metrics --replicas=1vector:
logsPipeline:
extraSteps:
- name: stepA
transform:
type: remap
source: |-
...
- name: stepB
transform:
type: remap
source: |-
...vector:
tracesPipeline:
extraSteps:
- name: stepA
transform:
type: filter
condition: |-
...vector:
eventsPipelines:
my_event_name:
inputs:
- logs_from_logs
- json_logs
extraSteps:
- name: filter_step
transform:
type: filter
condition: |-
...
- name: extraction_step
transform:
type: remap
source: |-
...
A groundcover account with permissions to create/edit Dashboards
A Terraform environment (groundcover provider >v1.1.1)
The groundcover Terraform provider configured with your API credentials
See also: groundcover Terraform provider reference for provider configuration and authentication details.
In order to create a dashboard using Terraform you first need to create a dashboard manually in order to export it in a Terraform format.
See Creating dashboards to learn more.
You can export a Dashboard into as a Terraform resource:
Open the Dashboard.
Click Actions → Export.
Download or copy the Terraform tab’s content and paste it into your .tf file (see placeholder above).
After saving this file as main.tf along with the provider details, type:
Dashboards added via Terraform are marked as Provisioned in the UI so you can quickly distinguish IaC‑managed Dashboards from manually created ones, both from the Dashboard List and inside the Dashboard itself.
Provisioned Dashboards are read‑only by default to protect the source of truth in your Terraform code.
To make a quick change, click Unlock dashboard. This allows editing directly in the UI, all changes are automatically saved as always.\
Important: Any changes can be overwritten the next time your provisioner runs terraform apply.
Safer alternative: Duplicate the Dashboard and edit the copy, then migrate those changes back into code.
Changing the resource and reapplying Terraform willupdate the Dashboard in groundcover.
Deleting the resource from your code (and applying) will delete it from groundcover.
See more examples on our Github repo.
Already have a Dashboard in groundcover? Bring it under Terraform management without recreating it:
After importing, run terraform plan to view the state and align your config with what exists.
Creating dashboars – how to build widgets and layouts in the UI
groundcover Terraform provider Github repo – resource schema, arguments, and examples
Send Real‑User‑Monitoring events using JS snippet embedded in web pages
Third Party
Integrate 3rd-party data sources that push data (e.g. OpenTelemtry, AWS Firehose, FluentBit, etc.)
*Only the Sensor has limited read capability in order to support pulling remote configuration such as OTTL parsing rules applied from the UI. RUM and Third Party have write-only configurations.
It is recommended to create a dedicated Ingestion Key for every data source, so that they can be managed and rotated appropriately, minimize exposure or risk, and allow groundcover to identify the datasource of all the ingested data.
Open Settings → Access → Ingestion Keys and click Create key.
Give the key a clear, descriptive Name (for example k8s-prod‑eu‑central‑1).
Select the Type that matches your integration.
Click Click & Copy Key.
Unlike API Keys, Ingestion Keys stay visible on the page. Treat every reveal as sensitive and follow the same secret‑handling practices.
Store they Key securely, and continue to integrate your data source.
The Ingestion Keys table lets you:
Reveal the key at any time.
See who created the key and when.
Sort by Type or Creator to locate specific credentials quickly.
Click ⋮ → Revoke next to the key. Revocation permanently deletes the key, unlike API Keys which only disables it:
The key will disappear from the list.
Any service using it will receive 403 / PERMISSION_DENIED and will not be able to continue to send data or pull latest configurations.
This operation cannot be undone — create a new key and update your deployments if you need access again.
Ingestion Key
API Key
Primary purpose
Write data (ingest)
Read data / manage resources via REST
Permissions capabilities
Write‑only + optional remote‑config read
Mirrors service‑account RBAC
Visibility after creation
Always revealable
Shown once only
Typical lifetime
Tied to integration lifecycle
Rotated for CI/CD automations
One key per integration – simplifies rotation and blast radius.
Store securely – AWS Secrets Manager, GCP Secret Manager, HashiCorp Vault, Kubernetes Secrets.
Rotate regularly – create a new key, roll it out, then revoke the old one.
Monitor for 403 errors – a spike usually means a revoked or expired key.
Sensor*
Install the eBPF sensor on Kubernetes or Hosts/VMs
RUM
Set Up the Webhook in groundcover
Head out to the integrations section: Settings -> Integrations, to create a new Webhook
Start by giving your Webhook integration a name. This name will be used below in the provider block sample .
Set the Webhook URL to the url you copied from field (1)
Keep the HTTP method as POST
Under headers add Authorization, and paste the "Bearer <token>" copied from field (2).
Create a Workflow
Go to Monitors --> Workflows --> Create Workflow, and paste the YAML configuration provided below.
Note: The body section is a dictionary of keys that will be sent as a JSON payload to the incident.io platform
Configure the provider Block
In the provider block, replace {{ providers.your-incident-io-integration-name }} with your actual Webhook integration name (the one you created in step 4)
For example, if you named your integration test-incidentio, the config reference would be: {{ providers.test-incidentio }}
Required Parameters for Creating an alert When triggering an alert, the following keys are required:
title - Alert title that can be pulled from groundcover as seen in the example
status - One of "firing" or "resolved" that can also be pulled from groundcover as the example shows.
You can include additional parameters for richer context (optional):
description
deduplication_key - unique attribute used to group identical alerts, groundcover provides this through the fingerprint attribute
metadata - Any additional metadata that you've configured within your monitor in groundcover. Note that these set should actively reflect your monitor definition in groundcover
Example code for your groundcover workflow:
To locate a channel ID, open the channel in Slack, click the channel name at the top, and scroll to the About section. The channel ID is shown at the bottom of this section.
\
The channel name should be included in the monitor’s Metadata Labels, or you can fall back to a default. See the channel_id attribute in the workflow example.\
Finally, replace the integration name in {{ providers.slack-routing-webhook }} with the actual name of the Webhook integration you created.
groundcover_policy – Defines RBAC policies (roles and optional data scope filters) Role-Based Access Control (RBAC)
groundcover_serviceaccount – Creates service accounts using attaches policies. Service Accounts
groundcover_apikey – Creates API keys for service accounts. API Keys
groundcover_monitor – Defines alerting rules and monitors.
groundcover_logspipeline - Defines Logs Pipeline configurations
groundcover_ingestionkey - Creates Ingestion keys.
groundcover_dashboard - Define Dashboards.
Run terraform init to install the provider.
api_key (String, Required, Sensitive): Your groundcover API key. It is strongly recommended to configure this using the TF_VAR_groundcover_api_key environment variable rather than hardcoding it.
base_url (String, Optional): The base URL for the groundcover API. Defaults to https://api.groundcover.com if not specified.
backend_id (String, Required): Your Backend ID can be found in the API Keys screen in the groundcover UI (Under Settings -> Access):
For full examples of all existing resources, see: https://github.com/groundcover-com/terraform-provider-groundcover/tree/main/examples/resources
Configure the page until it looks exactly right—filters, columns, panels, etc.
Click ➕Save View.
Give the view a clear Name.
Hit Save. The view is now listed and available to everyone in the project.
Scope – Saved Views are per‑page. A Logs view appears only in Logs; a Traces view only in Traces.
Filters & facets
All query filters plus facet open/closed state
Columns
Chosen columns, and their order, sort, width
Filter Panel & Added Facets
Filter panel open/closed
Facets added/removed
Logs
logs / patterns
textWrap
Insight show / hide
Traces
traces / span
table / drilldown
textWrap
API Catalog
protocol
Kafka role: Fetcher / Producer
Events
textWrap
The Update View button appears only when you are the creator of the view. Click it to overwrite the view with your latest changes.
Underneath every View you can see which user created it.
Edit / Rename
Creator
Delete
Creator
Star / Unstar
Any user for themselves
Searching the Views will look up based on View names and the creators.
The default sorting pins the favorites views at the top, and the rest of the views below. Each group of views is sorted from A→Z.
In addition, 3 filtering options are available:
All Views - The entire workspace's views for a specific page
My Favorites – The favorite views of the user for a specific page
Created By Me - The views created by the user
Real User Monitoring (RUM) extends groundcover’s observability platform to the client side, providing visibility into actual user interactions and front-end performance. It tracks key aspects of your web application as experienced by real users, then correlates them with backend metrics, logs, and traces for a full-stack view of your system.
Understand user experience - capture every interaction, page load, and performance metric from the end-user perspective to pinpoint front-end issues in real time.
Resolve issues faster - seamlessly tie front-end events to backend traces and logs in one platform, enabling end-to-end troubleshooting of user journeys.
Privacy first - groundcover’s Bring Your Own Cloud (BYOC) model ensures all RUM data stays in your own cloud environment. Sensitive user data never leaves your infrastructure, ensuring privacy and compliance without sacrificing insight.
groundcover RUM collects a wide range of data from users’ browsers through a lightweight JavaScript SDK. Once integrated into your web application, the SDK automatically gathers and sends the following telemetry from each user session to the groundcover platform:
Network requests: Every HTTP request initiated by the browser (such as API calls) is captured as a trace. Each client-side request can be linked with its corresponding server-side trace, giving you a complete picture of the request from the user’s click to the backend response.
Front-end logs: Client-side log messages (e.g., console.log outputs, warnings, and errors) are collected and forwarded to groundcover’s log management. This ensures that browser logs are stored alongside your application’s server logs for unified analysis.
Exceptions: Uncaught JavaScript exceptions and errors are automatically captured with full stack traces and contextual data (browser type, URL, etc.). These front-end errors become part groundcover monitors, letting you quickly identify and debug issues in the user’s environment.
Performance metrics (Core Web Vitals): Key performance indicators like page load time and Core Web Vitals like Largest Contentful Paint, Interaction to Next Paint, and Cumulative Layout Shift are measured for each page view. groundcover RUM records these metrics to help you track real-world performance and detect slowdowns affecting users.
User interactions: RUM tracks user interactions such as clicks, keydown, and navigation events. By recording which elements users interact with and when, groundcover helps you reconstruct user flows and understand the sequence of actions leading up to any issue or performance problem.
Custom events: You can instrument your application to send custom events via the RUM SDK. This allows you to capture domain-specific actions or business events (for example, a checkout completion or a specific UI gesture) with associated metadata, providing deeper insight into user behavior beyond automatic captures.
All collected data is streamed securely to your groundcover deployment. Because groundcover runs in your environment, RUM data (including potentially sensitive details from user sessions) is stored in the observability backend within your own cloud. From there, it is aggregated and indexed just like other telemetry, ready to be searched and analyzed in the groundcover UI.
One of the core advantages of groundcover RUM is its native integration with backend observability data. Every front-end trace, log, or event captured via RUM is contextualized alongside server-side data:
Trace correlation: Client-side traces (from browser network requests) are automatically correlated with server-side traces captured by groundcover’s eBPF-based instrumentation. This means when a user triggers an API call, you can see the complete distributed trace that spans the browser and the backend services, all in one view.
Unified logging: Front-end log entries and error reports are ingested into the same backend as your server-side logs. In the groundcover Log Explorer, you can filter and search across logs from both client and server, using common fields (like timestamp, session ID, or trace ID) to connect events.
End-to-end troubleshooting: With full-stack data in one platform, you can pivot easily between a user’s session replay, the front-end events, and the backend metrics/traces involved. This end-to-end context significantly reduces the time to isolate whether an issue originated in the frontend (browser/UI) or the backend (services/infrastructure), helping teams pinpoint problems faster across the entire stack.
By bridging the gap between the user’s browser and your cloud infrastructure, groundcover’s RUM capability ensures that no part of the user journey is invisible to your monitoring. This holistic view is critical for optimizing user experience and rapidly resolving issues that span multiple layers of your application.
Once RUM data is collected, it becomes available in the groundcover platform via the Sessions Explorer — a dedicated view for inspecting and troubleshooting user sessions. The Sessions Explorer allows you to navigate through user journeys and understand how your users experience your application.
Clicking on any session opens the Session View, where you can inspect a full timeline of the user’s experience. This view shows every key event captured during the session - including clicks, navigations, network requests, logs, custom events, and errors.
Each event is displayed in sequence with full context like timestamps, URLs, and stack traces. The Session View helps you understand exactly what the user did and what the system reported at each step, making it easier to trace issues and user flows.
groundcover will work out-of-the-box on all protocols, encryption libraries and runtimes below - generating traces and metrics with zero code changes.
We're growing our coverage all the time. Cant find what you're looking for? let us know over Slack.
groundcover seamlessly supports APM for encrypted communication - as long as it's listed below.
Encryption is unsupported for binaries which have been compiled without debug symbols ("stripped"). Known cases:
Crossplane
Get a list of all configured monitors in the system with their identifiers, titles, and types.
POST /api/monitors/list
This endpoint requires API Key authentication via the Authorization header.
The request body supports filtering by sources:
Parameters
Field Descriptions
Monitor Types
Get all monitors:
Get up and running in minutes in Kubernetes
Before installing groundcover in Kubernetes, please make sure your cluster meets the requirements.
After ensuring your cluster meets the requirements, complete the login and workspace setup, then choose your preferred installation method:
Sensor deployment requires installation values similar to these stored in a values.yaml file
{inCloud_Site} is your unique backend identifier, which is needed for the sensors to send data to your backend. This value will be sent to you by the groundcover team after inCloud Managed is set up.
Use groundcover CLI to automate the installation process. The main advantages of using this installation method are:
Auto-detection of cluster incompatibility issues
Tolerations setup automation
Tuning of resources according to cluster size
Supports passing helm overrides
Read more .
The CLI will automatically use existing ingestion keys or provision a new one if none exist
Deploying groundcover using the CLI
To upgrade groundcover to the latest version, simply re-run the groundcover deploy command with your desired overrides (such as -f values.yaml). The CLI will automatically detect and apply the latest available version during the deployment process.
For more details about ingestion keys, refer to our .
Add the recently created sensor key to the values.yaml file provided by the groundcover team
Initial installation:
Upgrade groundcover:
For CI/CD deployments using ArgoCD, refer to our .
Check out our
Navigate to the Dashboard List and click on the Create New Dashboard button.
Provide an indicative name for your dashboard and, optionally, a description.
Create a new widget
Choose a Widget Type
Select a Widget Mode
Build your query
Optional:
Add variables
Apply variable(s) to the widget
Widgets can be added by clicking on the Create New Widget button.
Widgets are the main building blocks of dashboards. groundcover supports the following widget types:
Chart Widget: Visualize your data through various display types.
Textual Widget: Add context to your dashboard, such as headers or instructions for issue investigations.
Metrics: Work with all your available metrics for advanced use cases and custom metrics.
Infra Metrics: Use expert-built, predefined queries for common infrastructure scenarios. Ideal for quick starts.
Logs: Query and visualize log data.
Traces: Query and visualize trace data similar to logs.
Once the Widget Mode selected, build your query for the visualization.
Variables dynamically filter your entire dashboard or specific widgets with just one click. They consist of a key-value pair that you define once and reuse across multiple widgets.
Our predefined variables cover most use cases, but if you’re missing an important one, let us know. Advanced variables are also on our roadmap.
Click on Add Variable.
Select the variable key and values from the predefined list.
Optionally, rename the variable or use the default name, then click Create.
Once created, select the values to apply to this variable.
Variables can be referenced in the Filter Bar of the Widget Creation Modal using their name.
Create a variable (for example, select Clusters from the predefined list, and name it 'clusters')
While creating or editing a Chart Widget, add a reference to the variable using a dollar sign in the filter bar, (for example, $clusters).
The data will automatically filter by the variable's key with the selected values. If all values are selected, the filter will be followed by an asterisk (for example, cluster:*)
Creates a new workflow for alert handling and notifications. Workflows define how alerts are processed and routed to various integrations like Slack, PagerDuty, webhooks, etc.
POST /api/workflows/create
This endpoint requires API key authentication.
The request body should contain raw YAML defining the workflow configuration. The YAML structure should include:
id: Unique identifier for the workflow
description: Human-readable description
triggers: Array of trigger conditions
To route alerts to a specific integration (Slack, PagerDuty, webhook, etc.), use the config field in the provider section to reference your configured integration by name.
config: '{{ providers.integration-name }}' - References a specific integration you've configured in groundcover
type - Specifies the integration type (slack, webhook, pagerduty, opsgenie)
Replace integration-name with your actual integration name.
The integration name must match the name of an integration you've previously configured in your groundcover workspace.
For workflow examples and advanced configurations, see the .
Log Patterns help you cut through log noise by grouping similar logs based on structure. Instead of digging through thousands of raw lines, you get a clean, high-level view of what’s actually going on
Log Patterns in groundcover help you make sense of massive log volumes by grouping logs with similar structure. Instead of showing every log line, the platform automatically extracts the static skeleton and replace dynamic values like timestamps, user IDs, or error codes with smart tokens.
This lets you:
Cut through the noise
Spot recurring behaviors
Investigate anomalies faster
groundcover automatically detects variable parts of each log line and replace them with placeholders to surface the repeating structure.
Log Patterns are collected directly on the sensor.
Raw log:
Patterned:
Go to the Logs section.
Switch from Records to Patterns using the toggle at the top.
Patterns are grouped and sorted by frequency. You’ll see:
Log level (Error, Info, etc.)
You can hover over any tag in a pattern to preview the distribution of values for that specific token. This feature provides a breakdown of sample values and their approximate frequency, based on sampled log data.
This is especially useful when investigating common IPs, error codes, user identifiers, or other dynamic fields, helping you understand which values dominate or stand out without drilling into individual logs.
For example, hovering over an
<IP4>token will show a tooltip listing the most common IP addresses and their respective counts and percentages.
Click a pattern: Filters the Logs view to only show matching entries.
Use filters: Narrow things down by workload, level, format, or custom fields.
Suppress patterns: Hide noisy templates like health checks to stay focused on what matters.
Export patterns: Use the three-dot menu to copy the pattern for further analysis or alert creation.
Role-Based Access Control (RBAC) in groundcover gives you a flexible way to manage who can access certain features and data in the platform. By defining both default roles and policies, you ensure each team member only sees and does what their level of access permits. This approach strengthens security and simplifies onboarding, allowing administrators to confidently grant or limit access.
Policies are the foundational elements of groundcover’s RBAC. Each policy defines:
A permission level – which actions the user can perform (Admin, Editor, or Viewer-like capabilities).
A data scope – which clusters, environments, or namespaces the user can see.
By assigning one or more policies to a user, you can precisely control both what they can do and where they can do it.
groundcover provides three default policies to simplify common use cases:
Default Admin Policy
Permission: Admin
Data Scope: Full (no restrictions)
Behavior: Unlimited access to groundcover features and configurations.
These default policies allow you to quickly onboard new users with typical Admin/Editor/Viewer capabilities. However, you can also create custom policies with narrower data scopes, if needed.
A policy’s data scope can be defined in two modes: Simple or Advanced.
Simple Mode
Uses AND logic across the specified conditions.
Applies the same scope to all entity types (e.g., logs, traces, events, workloads).
Example: “Cluster = Dev
When creating or editing a policy, you select permission (Admin, Editor, or Viewer) and a data scope mode (Simple or Advanced).
A user can be associated with multiple policies. When that occurs:
Permission Merging
The user’s final permission level is the highest among all assigned policies.
Example: If one policy grants Editor and another grants Viewer, the user is effectively an Editor overall.
Data Scope Merging
A user may be assigned a policy granting the Editor role with a data scope relevant to specific clusters, and simultaneously be assigned another policy granting the Viewer role with a different data scope. The user's effective access is determined by the highest role across all assigned policies and by the union (OR) of scopes.
In summary:
Policies define both permission (Admin, Editor, or Viewer) and data scope (clusters, environments, namespaces).
Default Policies (Admin, Editor, Viewer) provide no data restrictions, suitable for quick onboarding.
Custom Policies allow more granular restrictions, specifying exactly which entities a user can see or modify.
This flexible system gives you robust control over observability data in groundcover, ensuring each user has precisely the access they need.
The Metric Summary page shows all metrics in your system with their cardinality, type, unit, and labels. It helps you spot high-cardinality metrics that might slow things down and understand what labels are available for building queries.
This page displays every metric groundcover collects, along with how many unique label combinations each one has. You can use it to:
Quickly search for metrics by name
resource "groundcover_dashboard" "llm_observability" {
name = "LLM Observability"
description = "Dashboard to monitor OpenAI and Anthropic usage"
preset = "{\"widgets\":[{\"id\":\"B\",\"type\":\"widget\",\"name\":\"Total LLM Calls\",\"queries\":[{\"id\":\"A\",\"expr\":\"span_type:openai span_type:anthropic | stats by(span_type) count() count_all_result | sort by (count_all_result desc) | limit 5\",\"dataType\":\"traces\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"stat\"}},{\"id\":\"D\",\"type\":\"widget\",\"name\":\"LLM Calls Rate\",\"queries\":[{\"id\":\"A\",\"expr\":\"sum(rate(groundcover_resource_total_counter{type=~\\\"openai|anthropic\\\",status_code=\\\"ok\\\"})) by (gen_ai_request_model)\",\"dataType\":\"metrics\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"time-series\",\"selectedChartType\":\"stackedBar\"}},{\"id\":\"E\",\"type\":\"widget\",\"name\":\"Average LLM Response Time\",\"queries\":[{\"id\":\"A\",\"expr\":\"avg(groundcover_resource_latency_seconds{type=~\\\"openai|anthropic\\\"}) by (type)\",\"dataType\":\"metrics\",\"step\":\"disabled\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"stat\",\"step\":\"disabled\",\"selectedUnit\":\"Seconds\"}},{\"id\":\"A\",\"type\":\"widget\",\"name\":\"Total LLM Tokens Used\",\"queries\":[{\"id\":\"A\",\"expr\":\"span_type:openai span_type:anthropic | stats by(span_type) sum(gen_ai.response.usage.total_tokens) sum_result | sort by (sum_result desc) | limit 5\",\"dataType\":\"traces\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"stat\",\"step\":\"disabled\"}},{\"id\":\"C\",\"type\":\"widget\",\"name\":\"AVG Input Tokens Per LLM Call \",\"queries\":[{\"id\":\"A\",\"expr\":\"span_type:openai OR span_type:anthropic | stats by(span_type) avg(gen_ai.response.usage.input_tokens) avg_result | sort by (avg_result desc) | limit 5\",\"dataType\":\"traces\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"stat\"}},{\"id\":\"F\",\"type\":\"widget\",\"name\":\"AVG Output Tokens Per LLM Call \",\"queries\":[{\"id\":\"A\",\"expr\":\"span_type:openai OR span_type:anthropic | stats by(span_type) avg(gen_ai.response.usage.output_tokens) avg_result | sort by (avg_result desc) | limit 5\",\"dataType\":\"traces\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"stat\",\"step\":\"disabled\"}},{\"id\":\"G\",\"type\":\"widget\",\"name\":\"Top Used Models\",\"queries\":[{\"id\":\"A\",\"expr\":\"span_type:openai OR span_type:anthropic | stats by(gen_ai.request.model) count() count_all_result | sort by (count_all_result desc) | limit 100\",\"dataType\":\"traces\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"bar\",\"step\":\"disabled\"}},{\"id\":\"H\",\"type\":\"widget\",\"name\":\"Total LLM Errors \",\"queries\":[{\"id\":\"A\",\"expr\":\"(span_type:openai OR span_type:anthropic) status:error | stats by(span_type) count() count_all_result | sort by (count_all_result desc) | limit 1\",\"dataType\":\"traces\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"stat\"}},{\"id\":\"I\",\"type\":\"widget\",\"name\":\"AVG TTFT Over Time by Model\",\"queries\":[{\"id\":\"A\",\"expr\":\"avg(groundcover_workload_latency_seconds{gen_ai_system=~\\\"openai|anthropic\\\",quantile=\\\"0.50\\\"}) by (gen_ai_request_model)\",\"dataType\":\"metrics\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"time-series\",\"selectedChartType\":\"line\",\"selectedUnit\":\"Seconds\"}},{\"id\":\"J\",\"type\":\"widget\",\"name\":\"Avg Output Tokens Per Second by Model\",\"queries\":[{\"id\":\"A\",\"expr\":\"avg(groundcover_gen_ai_response_usage_output_tokens{}) by (gen_ai_request_model)\",\"dataType\":\"metrics\",\"editorMode\":\"builder\"},{\"id\":\"B\",\"expr\":\"avg(groundcover_workload_latency_seconds{quantile=\\\"0.50\\\"}) by (gen_ai_request_model)\",\"dataType\":\"metrics\",\"editorMode\":\"builder\"},{\"id\":\"formula-A\",\"expr\":\"A / B\",\"dataType\":\"metrics-formula\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"time-series\",\"selectedUnit\":\"Number\"}}],\"layout\":[{\"id\":\"B\",\"x\":0,\"y\":0,\"w\":4,\"h\":6,\"minH\":4},{\"id\":\"D\",\"x\":0,\"y\":30,\"w\":24,\"h\":6,\"minH\":4},{\"id\":\"E\",\"x\":8,\"y\":0,\"w\":8,\"h\":6,\"minH\":4},{\"id\":\"A\",\"x\":16,\"y\":0,\"w\":8,\"h\":6,\"minH\":4},{\"id\":\"C\",\"x\":0,\"y\":12,\"w\":8,\"h\":6,\"minH\":4},{\"id\":\"F\",\"x\":8,\"y\":24,\"w\":8,\"h\":6,\"minH\":4},{\"id\":\"G\",\"x\":16,\"y\":24,\"w\":8,\"h\":6,\"minH\":4},{\"id\":\"H\",\"x\":4,\"y\":0,\"w\":4,\"h\":6,\"minH\":4},{\"id\":\"I\",\"x\":0,\"y\":18,\"w\":24,\"h\":6,\"minH\":4},{\"id\":\"J\",\"x\":0,\"y\":3,\"w\":24,\"h\":6,\"minH\":4}],\"duration\":\"Last 15 minutes\",\"variables\":{},\"spec\":{\"layoutType\":\"ordered\"},\"schemaVersion\":4}"
}terraform plan
terraform apply# Syntax
terraform import groundcover_dashboard.<local_name> <dashboard_id>
# Example
terraform import groundcover_dashboard.service_overview dsh_1234567890helm upgrade --install groundcover groundcover/groundcover \
--set global.groundcover_token=<INGESTION_KEY>,clusterId={cluster-name}exporters:
otlphttp/groundcover:
endpoint: https://{GROUNDCOVER_MANAGED_OPENTELEMETRY_ENDPOINT}
headers:
apikey: {INGESTION_KEY}
pipelines:
traces:
exporters:
- otlphttp/groundcoverworkflow:
id: webhook
description: Sends an API to incident.io alerts endpoint
triggers:
- type: alert
consts:
description: keep.dictget( {{ alert.annotations }}, "_gc_description", '')
issue: https://app.groundcover.com/monitors/issues?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.fingerprint }}
monitor: https://app.groundcover.com/monitors?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.labels._gc_monitor_id }}
redacted_labels: keep.dict_pop({{alert.labels}}, "_gc_monitor_id", "_gc_monitor_name", "_gc_severity", "backend_id", "grafana_folder", "_gc_issue_header")
silence: https://app.groundcover.com/monitors/create-silence?keep.replace(keep.join({{ consts.redacted_labels }}, "&", "matcher_"), " ", "+")
title: keep.dictget( {{ alert.annotations }}, "_gc_issue_header", "{{ alert.alertname }}")
name: incident-io-alerts-workflow
actions:
- name: webhook
provider:
config: ' {{ providers.your-incident-io-integration-name }} '
type: webhook
with:
body:
title: '{{ alert.alertname }}'
description: '{{ alert.description }}'
deduplication_key: '{{ alert.fingerprint }}'
status: '{{ alert.status }}'
# To use metadata attributes that refer to alert.labels, the attributes
# must be used in the group by section of the monitor - the example below
# assumes that cluster, namespace and workload were used for group by
metadata:
cluster: '{{ alert.labels.cluster }}'
namespace: '{{ alert.labels.namespace }}'
service: '{{ alert.labels.workload }}'
severity: '{{ alert.annotations._gc_severity }}'workflow:
id: slack-channel-routing-workflow
description: workflow for all channels with dynamic routing
triggers:
- type: alert
filters:
- key: annotations.slack-channel-routing-workflow
value: enabled
name: slack-channel-routing-workflow
consts:
channels: '{"devops":"C0111111111", "alerts":"C0222222222", "incidents":"C0333333333"}'
channel_id: keep.dictget( '{{ consts.channels }}', '{{ alert.labels.channel_id }}', 'C09G9AFHLTB')
env: keep.dictget({{ alert.labels }}, 'env', 'no-env')
upper_env: "keep.uppercase({{consts.env}})"
severity: keep.dictget({{ alert.annotations }}, '_gc_severity', 'unknown-severity')
summary: keep.dictget({{ alert.labels }}, 'summary', 'no-summary')
slack_message: "<https://app.groundcover.com/monitors/create-silence?keep.replace(keep.join(keep.dict_pop({{ alert.labels }}, \"_gc_monitor_id\", \"_gc_monitor_name\", \"_gc_severity\", \"backend_id\", \"grafana_folder\", \"_gc_issue_header\"), \"&\", \"matcher_\"), \" \", \"+\")|Silence> :no_bell: | \n<https://app.groundcover.com/monitors/issues?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.fingerprint }}|Investigate> :mag: | \n<https://app.groundcover.com/monitors?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.labels._gc_monitor_id }}|See Monitor> :chart_with_upwards_trend:\n\n*Labels:* \n- keep.join(keep.dict_pop({{alert.labels}}, \"_gc_monitor_id\", \"_gc_monitor_name\", \"_gc_severity\", \"backend_id\", \"grafana_folder\", \"_gc_issue_header\"), \"\\n- \")\n"
title_link: "https://app.groundcover.com/monitors/issues?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.fingerprint }}"
red_color: "#FF0000"
green_color: "#008000"
footer_url: "groundcover.com"
footer_icon: "https://app.groundcover.com/favicon.ico"
actions:
- if: "{{ alert.status }} == 'firing'"
name: webhook-alert
provider:
type: webhook
config: "{{ providers.slack-routing-webhook }}"
with:
body:
channel: "{{ consts.channel_id }}"
attachments:
- color: "{{ consts.red_color }}"
footer: "{{ consts.footer_url }}"
footer_icon: "{{ consts.footer_icon }}"
text: "{{ consts.slack_message }}"
title: "\U0001F6A8 Firing: {{ alert.alertname }} [{{ consts.upper_env}}]"
title_link: "{{ consts.title_link }}"
type: plain_text
- if: "{{ alert.status }} != 'firing'"
name: webhook-alert-resolved
provider:
type: webhook
config: "{{ providers.slack-routing-webhook }}"
with:
body:
channel: "{{ consts.channel_id }}"
text: "\u2705 [RESOLVED][{{ consts.upper_env}}] {{ consts.severity }} {{ alert.alertname }}"
attachments:
- color: "{{ consts.green_color }}"
text: "*Summary:* {{ consts.summary }}"
fields:
- title: "Environment"
value: "{{ consts.upper_env}}"
short: true
footer: "{{ consts.footer_url }}"
footer_icon: "{{ consts.footer_icon }}"terraform {
required_providers {
groundcover = {
source = "registry.terraform.io/groundcover-com/groundcover"
version = ">= 0.0.0" # Replace with actual version constraint
}
}
}provider "groundcover" {
api_key = "YOUR_API_KEY" # Required
base_url = "https://api.groundcover.com" # Optional, change if using onprem/airgap deployment
backend_id = "groundcover" # Your Backend ID can be found in the groundcover UI under Settings->Access->API Keys
}resource "groundcover_policy" "read_only" {
name = "Read-Only Policy"
description = "Grants read-only access"
claim_role = "ci-readonly-role"
roles = {
read = "read"
}
}
resource "groundcover_serviceaccount" "ci_account" {
name = "ci-pipeline-account"
description = "Service account for CI"
policy_uuids = [groundcover_policy.read_only.id]
}
resource "groundcover_apikey" "ci_key" {
name = "CI Key"
description = "Key for CI pipeline"
service_account_id = groundcover_serviceaccount.ci_account.id
}Insight show / hide
Revocation effect
Data stops flowing immediately
API calls fail







Save the widget
Pie
Select a data source and aggregation method.
Logs
Traces
Time Series
Choose a Y-axis unit from the predefined list.
Select a visualization type: Stacked Bar or Line Chart.
Metrics
Infra Metrics
Logs
Traces
Table
Define columns based on data fields or metrics.
Choose a Y-axis unit from the predefined list.
Metrics
Infra Metrics
Logs
Traces
Stat
Select a Y-axis unit from the predefined list.
Metrics
Infra Metrics
Logs
Traces
Top List
Choose a ranking metric and sort order.




Logs
Traces
Authorization
Yes
Bearer token with your API key
Content-Type
Yes
Must be application/json
Accept
Yes
Must be application/json
sources
array
No
Source filters (empty array returns all monitors)
monitors
array
Array of monitor objects
uuid
string
Unique identifier for the monitor
title
string
Monitor name/description
type
string
Monitor type (see monitor types below)
"metrics"
Metrics-based monitoring
"traces"
Distributed tracing monitoring
"logs"
Log-based monitoring
"events"
Event-based monitoring
"infra"
Infrastructure monitoring
""
(empty string) General/unspecified monitoring
actions: Array of actions to perform when triggeredname: Display name for the workflow
consts (optional): Constants and helper variables
Authorization
Bearer <YOUR_API_KEY>
Your groundcover API key
Content-Type
text/plain
The request body should be raw YAML
<V>
Version
v0.32.0
<TM>
Time measure
5.5ms
Count and percentage of total logs
Pattern’s trend over time
Workload origin
The structured pattern itself
<TS>
Timestamp
2025-03-31T17:00:00Z
<N>
Number
404, 123
<IP4>
IPv4 Address
192.168.0.1
<*>
Wildcard (text, path, etc.)

/api/v1/users/42
This workflow sends a formatted Slack message using Block Kit:
This workflow creates a PagerDuty incident:
This workflow creates an Opsgenie alert:
Alias is used to group identical events together in Opsgenie (alias key in the payload)
Severities must be mapped to Opsgenie valid severities (priority key in the payload)
Tags are a list of string values (tags key in the payload)
This workflow creates a Jira ticket using webhook integration:
This workflow performs multiple actions for the same alert:
Execute only on the Prod environment. The "env" attribute needs to be part of the monitor context attributes (either by using it in the group by section or by explicitly adding it as a context label):
This example shows how to combine multiple filters. In this case it will match events from the prod environment and also monitors that explicitly routed the workflow with the name "actual-name-of-workflow":
In this case we will use a regular expression to filter on events coming from the groundcover OR monitoring namespaces. Note that any regular expression can be used:
The consts section is the best location to create pre-defined attributes and apply different transformations on the monitor's metadata for formatting the notification messaging.
Severities in your notified destination may not match the groundcover predefined severities. By using a dictionary, you can map any groundcover severity value to another, and extract it by using the actual monitor severity. Use the "keep.dictget" function to extract from a dictionary and apply a default in case the value is missing.
When accessing a context label via alert.labels, if this label is not transferred within the monitor - the workflow might crash. Best practice to pre-define labels is to declare them in the consts section with a default value, using "keep.dictget" so the value is gracefully pulled from the labels object.
Note: Label names that are dotted, like "cloud.region" in this example, cannot be referenced in the monitor itself and can only be retrieved using this technique of pulling the value with "keep.dictget" from the alert.labels object.
keep.dict_pop({{alert.labels}}, "_gc_monitor_id", "_gc_monitor_name", "_gc_severity", "backend_id", "grafana_folder", "_gc_issue_header") - "Clean" a key-value dictionary from some irrelevant values (keys). In this case, the groundcover labels dictionary has some internal keys that you might not want to include in your notification content.
keep.join(["a", "b", "c"], ",") - Joins a list of elements into a string using a given delimiter. In this case the output is "a,b,c".
Use "if" condition to apply logic on different actions.
Create a separate block for a firing monitor (a resolved monitor can use different logic to change formatting of the notification):
"If" statements can include and/or logic for multiple conditions:
Use the function keep.is_business_hours combined with an "if" statement to trigger an action within specific hours only.
In this example the action block will execute on Sundays (6) between 20-23 (8pm to 11pm) or on Mondays (0) between 0-1am:
{
"sources": []
}{
"monitors": [
{
"uuid": "string",
"title": "string",
"type": "string"
}
]
}curl -L \
--request POST \
--url 'https://api.groundcover.com/api/monitors/list' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--header 'Content-Type: application/json' \
--data-raw '{"sources":[]}'{
"monitors": [
{
"uuid": "xxxx-xxxx-xxxx-xxxx-xxxx",
"title": "PVC usage above threshold (90%)",
"type": "metrics"
},
{
"uuid": "xxxx-xxxx-xxxx-xxxx-xxxx",
"title": "HTTP API Errors Monitor",
"type": "traces"
},
{
"uuid": "xxxxx-xxxx-xxxx-xxxx-xxxx",
"title": "Error Logs Monitor",
"type": "logs"
},
{
"uuid": "xxxxx-xxxx-xxxx-xxxx-xxxx",
"title": "Node CPU Usage Average is Above 85%",
"type": "infra"
},
{
"uuid": "xxxx-xxxx-xxxx-xxxx-xxxx",
"title": "Rolling Update Triggered",
"type": "events"
},
{
"uuid": "xxxx-xxxx-xxxx-xxxx-xxxx",
"title": "Deployment Partially Not Ready - 5m",
"type": "events"
}
]
}global:
backend:
enabled: false
ingress:
site: {inCloud_Site}
clusterId: "your-cluster-name" # CLI will automatically detect cluster namesh -c "$(curl -fsSL https://groundcover.com/install.sh)"groundcover deploy -f values.yamlsh -c "$(curl -fsSL https://groundcover.com/install.sh)"groundcover auth get-ingestion-key sensorglobal:
groundcover_token: {sensor_key}
backend:
enabled: false
ingress:
site: {inCloud_site}
clusterId: "your-cluster-name"# Add groundcover Helm repository and fetch latest chart
helm repo add groundcover https://helm.groundcover.com && helm repo update groundcoverhelm upgrade \
groundcover \
groundcover/groundcover \
-i \
--create-namespace \
-n groundcover \
-f values.yamlhelm repo update groundcover && helm upgrade \
groundcover \
groundcover/groundcover \
-n groundcover \
-f values.yamlgroundcover deletehelm uninstall groundcover -n groundcover# delete the namespace in order to remove the PVCs as well
kubectl delete ns groundcovercurl -L \
--request POST \
--url 'https://api.groundcover.com/api/workflows/create' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--header 'Content-Type: text/plain' \
--data 'id: example-workflow
description: Example workflow for API documentation
triggers:
- type: alert
filters:
- key: annotations.example-workflow
value: enabled
name: example-workflow
consts:
severity: keep.dictget({{ alert.annotations }}, "_gc_severity", "info")
title: keep.dictget({{ alert.annotations }}, "_gc_issue_header", "{{ alert.alertname }}")
actions:
- name: webhook-action
provider:
type: webhook
config: "{{ providers.webhook-provider }}"
with:
body:
alert_name: "{{ consts.title }}"
severity: "{{ consts.severity }}"'{
"workflow_id": "xxxxx-xxxx-xxxxx-xxxx-xxxxx",
"status": "created",
"revision": 1
}id: workflow-id
description: Workflow description
triggers:
- type: alert
filters:
- key: filter-key
value: filter-value
name: workflow-name
consts:
variable_name: value
actions:
- name: action-name
provider:
type: provider-type
config: provider-config
with:
action-specific-parametersactions:
- if: '{{ alert.status }} == "firing"'
name: slack-action-firing
provider:
config: '{{ providers.integration-name }}'
type: slack
with:
attachments:
- color: '#FF0000'
footer: 'groundcover.com'
footer_icon: 'https://app.groundcover.com/favicon.ico'
text: 'Alert details here'
title: 'Firing: {{ alert.alertname }}'
ts: keep.utcnowtimestamp()
type: plain_text
message: ' '192.168.0.1 - - [30/Mar/2025:12:00:01 +0000] "GET /api/v1/users/123 HTTP/1.1" 200<IP4> - - [<TS>] "<*> HTTP/<N>.<N>" <N>workflow:
id: slack-notification
description: Send Slack notification for alerts
triggers:
- type: alert
actions:
- name: slack-notification
provider:
type: slack
config: '{{ providers.slack_webhook }}'
with:
message: "Alert: {{ alert.alertname }} - Status: {{ alert.status }}"workflow:
id: slack-rich-notification
description: Send formatted Slack notification
triggers:
- type: alert
actions:
- name: slack-rich-notification
provider:
type: slack
config: '{{ providers.slack_webhook }}'
with:
blocks:
- type: header
text:
type: plain_text
text: ':rotating_light: {{ alert.alertname }} :rotating_light:'
emoji: true
- type: divider
- type: section
fields:
- type: mrkdwn
text: |-
*Cluster:*
{{ alert.labels.cluster}}
- type: mrkdwn
text: |-
*Namespace:*
{{ alert.labels.namespace}}
- type: mrkdwn
text: |-
*Status:*
{{ alert.status}} workflow:
id: pagerduty-incident-workflow
description: Create PagerDuty incident for alerts
name: pagerduty-incident-workflow
triggers:
- type: alert
filters:
- key: annotations.pagerduty-incident-workflow
value: enabled
consts:
severities: '{"S1": "critical","S2": "error","S3": "warning","S4": "info","critical": "critical","error": "error","warning": "warning","info": "info"}'
severity: keep.dictget( '{{ consts.severities }}', '{{ alert.annotations._gc_severity }}', 'info')
description: keep.dictget( {{ alert.annotations }}, "_gc_description", "")
title: keep.dictget( {{ alert.annotations }}, "_gc_issue_header", '{{ alert.alertname }}')
redacted_labels: keep.dict_pop({{ alert.labels }}, "_gc_monitor_id", "_gc_monitor_name", "_gc_severity", "backend_id", "grafana_folder")
env: keep.dictget( {{ alert.labels }}, "env", "- no env -")
namespace: keep.dictget( {{ alert.labels }}, "namespace", "- no namespace -")
workload: keep.dictget( {{ alert.labels }}, "workload", "- no workload -")
pod: keep.dictget( {{ alert.labels }}, "podName", "- no pod -")
issue: https://app.groundcover.com/monitors/issues?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.fingerprint }}
monitor: https://app.groundcover.com/monitors?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.labels._gc_monitor_id }}
silence: https://app.groundcover.com/monitors/create-silence?keep.replace(keep.join({{ consts.redacted_labels }}, "&", "matcher_"), " ", "+")
actions:
- name: pagerduty-alert
provider:
config: '{{ providers.pagerduty-integration-name }}'
type: pagerduty
with:
title: '{{ consts.title }}'
severity: '{{ consts.severity }}'
dedup_key: '{{alert.fingerprint}}'
custom_details:
01_environment: '{{ consts.env }}'
02_namespace: '{{ consts.namespace }}'
03_service_name: '{{ consts.workload }}'
04_pod: '{{ consts.pod }}'
05_labels: '{{ consts.redacted_labels }}'
06_monitor: '{{ consts.monitor }}'
07_issue: '{{ consts.issue }}'
08_silence: '{{ consts.silence }}'
workflow:
id: Opsgenie Example
description: "Opsgenie workflow"
triggers:
- type: alert
filters:
- key: annotations.Opsgenie Example
value: enabled
consts:
description: keep.dictget( {{ alert.annotations }}, "_gc_description", "")
redacted_labels: keep.dict_pop({{ alert.labels }}, "_gc_monitor_id", "_gc_monitor_name", "_gc_severity", "backend_id", "grafana_folder", "CampaignName")
severities: '{"S1": "P1","S2": "P2","S3": "P3","S4": "P4","critical": "P1","error": "P2","warning": "P3","info": "P4"}'
severity: keep.dictget({{ consts.severities }}, {{ alert.annotations._gc_severity }}, "P3")
title: keep.dictget( {{ alert.annotations }}, "_gc_issue_header", '{{ alert.alertname }}')
region: keep.dictget( {{ alert.labels }}, "cloud.region", "")
TenantID: keep.dictget( {{ alert.labels }}, "tenantID", "")
name: Opsgenie Example
actions:
- if: '{{ alert.status }} == "firing"'
name: opesgenie-alert
provider:
config: "{{ providers.Opsgenie }}"
type: opsgenie
with:
alias: '{{ alert.fingerprint }}'
description: '{{ consts.description }}'
details: '{{ consts.redacted_labels }}'
message: '{{ consts.title }}'
priority: '{{ consts.severity }}'
source: groundcover
tags:
- '{{ alert.alertname }}'
- '{{ consts.TenantID }}'
- '{{ consts.region }}'
workflow:
id: jira-ticket-creation
description: Create Jira ticket for alerts
triggers:
- type: alert
consts:
description: keep.dictget({{ alert.annotations }}, "_gc_description", '')
title: keep.dictget({{ alert.annotations }}, "_gc_issue_header", "{{ alert.alertname }}")
actions:
- name: jira-ticket
provider:
type: webhook
config: '{{ providers.jira_webhook }}'
with:
body:
fields:
description: '{{ consts.description }}'
issuetype:
id: 10001
project:
id: 10000
summary: '{{ consts.title }}'workflow:
id: multi-action-workflow
description: Perform multiple actions for critical alerts
triggers:
- type: alert
filters:
- key: severity
value: critical
actions:
- name: slack-notification
provider:
type: slack
config: '{{ providers.slack_webhook }}'
with:
message: "Critical alert: {{ alert.alertname }}"
- name: pagerduty-incident
provider:
type: pagerduty
config: '{{ providers.pagerduty_prod }}'
with:
title: "Critical: {{ alert.alertname }}"
- name: jira-ticket
provider:
type: webhook
config: '{{ providers.jira_webhook }}'
with:
body:
fields:
summary: "Critical Alert: {{ alert.alertname }}"
description: "Critical alert triggered in {{ alert.labels.namespace }}"
issuetype:
id: 10001
project:
id: 10000workflow:
id: specific-monitor-workflow
description: Workflow triggered only by Workload Pods Crashed Monitor
triggers:
- type: alert
filters:
- key: alertname
value: Workload Pods Crashed Monitorworkflow:
id: prod-only-workflow
description: Workflow triggered only by production environment alerts
triggers:
- type: alert
filters:
- key: env
value: prodworkflow:
id: multi-filter-workflow
description: Workflow triggered by critical alerts in production
triggers:
- type: alert
filters:
- key: env
value: prod
- key: annotations.actual-name-of-workflow
value: enabledworkflow:
id: regex-filter-workflow
description: Workflow triggered by alerts from groundcover or monitoring namespaces
triggers:
- type: alert
filters:
- key: namespace
value: r"(groundcover|monitoring)"workflow:
id: severity-mapping-example
description: Example of mapping severities using consts
triggers:
- type: alert
consts:
severities: '{"S1": "P1","S2": "P2","S3": "P3","S4": "P4","critical": "P1","error": "P2","warning": "P3","info": "P4"}'
severity: keep.dictget({{ consts.severities }}, {{ alert.annotations._gc_severity }}, "P3")workflow:
id: labels-best-practice-example
description: Example of safely accessing monitor labels
triggers:
- type: alert
consts:
region: keep.dictget({{ alert.labels }}, "cloud.region", "")workflow:
id: conditional-actions-example
description: Example of conditional actions based on alert status
triggers:
- type: alert
actions:
- if: '{{ alert.status }} == "firing"'
name: slack-action-firing
provider:
config: '{{ providers.groundcover-alerts-dev }}'
type: slack
with:
attachments:
- color: '{{ consts.red_color }}'
footer: '{{ consts.footer_url }}'
footer_icon: '{{ consts.footer_icon }}'
text: '{{ consts.slack_message }}'
title: 'Firing: {{ alert.alertname }}'
title_link: '{{ consts.title_link }}'
ts: keep.utcnowtimestamp()
type: plain_text
message: ' 'workflow:
id: multi-condition-actions-example
description: Example of multiple conditions in actions
triggers:
- type: alert
actions:
- if: '{{ alert.status }} == "firing" and {{ alert.labels.namespace }} == "namespace1"'
name: slack-action-firing
provider:
config: '{{ providers.groundcover-alerts-dev }}'
type: slack
with:
attachments:
- color: '{{ consts.red_color }}'
footer: '{{ consts.footer_url }}'
footer_icon: '{{ consts.footer_icon }}'
text: '{{ consts.slack_message }}'
title: 'Firing: {{ alert.alertname }}'
title_link: '{{ consts.title_link }}'
ts: keep.utcnowtimestamp()
type: plain_text
message: ' 'workflow:
id: time-based-notification-example
description: Example of time-based conditional actions
triggers:
- type: alert
actions:
- if: '({{ alert.status }} == "firing" and (keep.is_business_hours(timezone="America/New_York", business_days=[6], start_hour=20, end_hour=23) or keep.is_business_hours(timezone="America/New_York", business_days=[0], start_hour=0, end_hour=1)))'
name: time-based-notification
provider:
type: slack
config: '{{ providers.slack_webhook }}'
with:
message: "Time-sensitive alert: {{ alert.alertname }}"Redis
supported
DNS
supported
Kafka
supported
MongoDB
supported
v3.6+
AMQP
supported
AMQP 0-9-1
GraphQL
supported
AWS S3
supported
AWS SQS
supported
HTTP
supported
gRPC
supported
MySQL
supported
PostgreSQL
supported
crypto/tls (golang)
supported
OpenSSL (c, c++, Python)
supported
NodeJS
supported
JavaSSL
supported
Java 11+ is supported. Requires
Default Editor Policy
Permission: Editor
Data Scope: Full (no restrictions)
Behavior: Full creation/editing capabilities on observability data, but no user or system management.
Default Viewer Policy
Permission: Viewer
Data Scope: Full (no restrictions)
Behavior: Read-only access to all data in groundcover.
QAAdvanced Mode
Lets you define different scopes for each data entity (logs, traces, events, workloads, etc.).
Each scope can use OR logic among conditions, allowing more fine-grained control.
Example:
Logs: “Cluster = Dev OR Prod,”
Traces: “Namespace = abc123,”
Events: “Environment = Staging OR Prod
Data scopes merge via OR logic, broadening the user's overall data access.
Example: Policy A => "Cluster = A," Policy B => "Environment = B," so final scope is "Cluster A OR Environment B."
This applies to all data types including logs, traces, events, workloads, and metrics.
Identify metrics with high cardinality that could impact performance
See what labels are available for each metric
Find the right metrics and labels when building dashboards and monitors
Use the search field at the top to filter metrics by name. The search is case-insensitive and matches partial names.
Example searches:
groundcover_workload finds all workload-related metrics
latency finds all latency metrics
cpu finds all CPU metrics
This chart shows total cardinality across all your metrics over the selected time range. You can track trends over time and spot sudden spikes that might indicate a problem.
The table shows details for each metric:
Metric Name
The full metric name, like groundcover_workload_latency_seconds
Type
Counter, Gauge, Summary, or Histogram
Unit
What the metric measures in (Seconds, Bytes, Number, etc.)
Cardinality
Number of unique label combinations in the past 24 hours. Higher numbers mean more unique time series
Percentage
What percentage of total cardinality this metric represents
Description
What the metric measures
Click any row to open a detailed drawer on the right. The drawer shows:
The metric name at the top
Navigation arrows to move between metrics
A cardinality chart specific to this metric
Two tabs with more information
Basic information about the metric:
Description
What the metric measures
Unit
Unit of measurement
Cardinality
Current unique label combination count
Type
Counter, Gauge, Summary, or Histogram
All labels available for this metric:
Label
The label name, like namespace or pod_name
Cardinality
How many unique values this label has
Values Preview
Up to 3 example values
Use this information when building queries or identifying which labels contribute most to cardinality.
Go to Explore > Metric Summary
Click the Cardinality column header to sort
Metrics at the top have the most unique label combinations
Click on one to see which labels drive the cardinality in the Labels tab
Type a keyword in the search bar (http, database, cpu, etc.)
Browse the filtered results
Click a metric to see its description and labels
Note the metric name and labels for your dashboards or monitors
Click any metric in the table
Check the Details tab for basic information
Switch to the Labels tab to see all available dimensions
Use the Values Preview to see what values each label can have
This endpoint requires API Key authentication via the Authorization header.
Required and optional fields for creating an ingestion key:
name
string
Yes
Unique name for the ingestion key (must be lowercase with hyphens)
type
string
Yes
Key type ("sensor", "thirdParty", "rum")
tags
array
No
"sensor"
Keys for groundcover sensors and agents
true
"thirdParty"
Keys for third-party integrations (OpenTelemetry, etc.)
false
"rum"
Keys for Real User Monitoring data ingestion
false
To verify the key was created successfully, use the List Ingestion Keys endpoint:
Names must be lowercase with hyphens as separators
No capital letters, spaces, or special characters (except hyphens)
Examples of valid names: production-k8s-sensor, otel-staging-api, rum-frontend
Examples of invalid names: Production-K8s, OTEL_API, rum frontend
For comprehensive information about ingestion keys, including usage and management, see:
This endpoint requires API Key authentication via the Authorization header.
Optional filters for ingestion keys:
name
string
No
Filter by exact key name
type
string
No
Filter by key type ("sensor", "thirdParty", "rum")
remoteConfig
boolean
No
Get only sensor keys:
id
string
Unique identifier for the ingestion key (UUID)
name
string
Human-readable name for the key
createdBy
string
Email of the user who created the key
creationDate
string
ISO 8601 timestamp of key creation
For comprehensive information about ingestion keys, including creation, usage, and best practices, see:
Datadog application key scopes:
dashboards_read - List and retrieve dashboards
monitors_read - View monitors
metrics_read - Query timeseries data
integrations_read - View AWS, GCP, Azure integrations
Navigate to Settings → Migrations.
Click Start on the Datadog card
Enter a project name (e.g., "Production Migration", "US5 Migration")
Click Create
Tip: Use descriptive names. You can run multiple migration projects for different environments or teams.
Provide your Datadog credentials:
The domain of your Datadog console. The options are:
US1 - app.datadoghq.com
US3 - us3.datadoghq.com
US5 - us5.datadoghq.com
EU1 - app.datadoghq.eu
AP1 - ap1.datadoghq.com
You can find your Datadog site by looking at your console's URL.
A regular Datadog API key. Find this under Organization Settings → API Keys.
Create one under Organization Settings → Application Keys with the required scopes listed above.
Click Fetch Assets. This typically takes 10 seconds depending on the number of assets.
Once fetched, you see:
Progress overview: Total assets discovered and their support status
Asset cards: Monitors, Dashboards, Data Sources, etc
Support breakdown: How many assets are fully supported, partial, or unsupported
The overview shows everything we found in your Datadog account and what we'll bring over.
Before migrating monitors and dashboards, we set up your data sources.
groundcover automatically discovers:
AWS integrations: CloudWatch metrics, account configurations
GCP integrations: Cloud Monitoring metrics, project setups
Azure integrations: Azure Monitor metrics, subscription details
Tip: Migrate all data sources first. This prevents missing data issues when monitors go live.
Once data sources are ready, migrate your monitors.
✓ Supported: Fully compatible. Migrate as-is.
⚠ Partial: Migrates with warnings. Review before installing.
✗ Unsupported: Requires manual attention.
For monitors with warnings, click View Warnings:
See what adjustments were made
Understand query translations
Get recommendations for post-migration verification
Warnings don't block migration — they inform you of changes so you can verify behavior.
Single monitor:
Preview the monitor
Click Migrate
Monitor installs immediately
Bulk migrate:
Select multiple monitors using checkboxes
Click Migrate Selected
All install in parallel
Migrated monitors appear instantly in Monitors → Monitor List.
Dashboards preserve:
Layout and widget positions
Query logic and filters
Time ranges and visualization settings
Colors and formatting
Check out the dashboard preview to confirm the migration worked and that all your assets came through successfully.
Click Migrate to install. Dashboards appear under Dashboards immediately.
Tip: Migrate critical dashboards first. Verify queries return expected data before bulk migrating.
This endpoint requires API Key authentication via the Authorization header.
Authorization
Yes
Bearer token with your API key
Content-Type
Yes
Must be application/json
Accept
Yes
Must be application/json
uuid
string
Yes
The unique identifier of the monitor to retrieve
Field Descriptions
title
string
Monitor name/title
display.header
string
Alert header template with variable substitution
display.resourceHeaderLabels
array
Labels shown in resource headers
display.contextHeaderLabels
array
Labels shown in context headers
Get monitor configuration by UUID:
Traces are a powerful observability pillar, providing granular insights into microservice interactions. Traditionally, they were hard to implement, requiring coordination of multiple teams and constant code changes, making this critical aspect very challenging to maintain.
groundcover's eBPF sensor disrupts the famous tradeoff, empowering developers to gain full visibility into their applications, effortlessly and without any code changes.
The platform supports two kinds of traces:
These traces are automatically generated for every service in your stack. They are available out-of-the-box and within seconds of installation. These traces always include critical information such as:
All services that took part in the interaction (both client and server)
Accessed resource
Full payloads, including:
All headers
These can be ingested into the platform, allowing to leverage already existing instrumentation to create a single pane of glass for all of your traces.
Traces are stored in groundcover's ClickHouse deployment, ensuring top notch performance on every scale.
groundcover further disrupts the customary traces experience by reinventing the concept of sampling. This innovation differs between the different types of traces:
These are generated by using 100% of the data, always processing every request being made, on every scale. However, the groundcover platform utilizes smart sampling to only store a fraction of the traces, while still generating an accurate picture. In general, sampling is performed according to these rules:
Requests with unusually high or low latencies, measured per resource
Requests which returned an error response (e.g 500 status code for HTTP)
"Normal" requests which form the baseline for each resource
Lastly, is utilized to make the sampling decisions on the node itself, without having to send or save any redundant traces.
Various mechanisms control the sampling performed over 3rd party traces. Read more here:
When integrating 3rd-party traces, it is often wise to configure some sampling mechanism according to the specific use case.
Each trace is enriched with additional information to give as much context as possible for the service which generated the trace. This includes:
Container information - image, environment variables, pod name
Logs generated by the service around the time of the trace
of the resource around the time of the trace
Kubernetes events relevant to the service
One of the advantages of ingesting 3rd-party traces is the ability to leverage their distributed tracing feature. groundcover natively displays the full trace for ingested traces in the Traces page.
Trace Attributes enable advanced filtering and search capabilities. groundcover support attributes across all trace types. This encompasses a diverse range of protocols such as HTTP, MongoDB, PostgreSQL, and others, as well as varied sources including eBPF or manual instrumentations (for example - OpenTelemetry).
groundcover enriches your original traces and generates meaningful metadata as key-value pairs. This metadata includes critical information, such as protocol type, http.path, db.statement, and similar attributes, aligning with OTel conventions. Furthermore, groundcover seamlessly incorporates this metadata from spans received through supported manual instrumentations. For an in-depth understanding of attributes in OTel, please refer to (external link to OpenTelemtry website).
Each attribute can be effortlessly integrated into your filters and search queries. You can add them directly from the trace side-panel with a simple click or input them manually into the search bar.
Example: If you want to filter all HTTP traces that contain the path "/products". The query would be formatted as: @http.path:"/products". For a comprehensive guide on the query syntax, see Syntax table below.
Trace Tags enable advanced filtering and search capabilities. groundcover support tags across all trace types. This encompasses a diverse range of protocols such as HTTP, MongoDB, PostgreSQL, and others, as well as varied sources including eBPF or manual instrumentations (for example - OpenTelemetry).
Tags are powerful metadata components, structured as key-value pairs. They offer insightful information about the resource generating the span, like: container.image.name ,host.name and more.
Tags include metadata enriched by the our sensor and additional metadata if provided by manual instrumentations (such as OpenTelemetry traces) . Utilizing these Tags enhances understanding and context of your traces, allowing for more comprehensive analysis and easier filtering by the relevant information.
Each tag can be effortlessly integrated into your filters and search queries. You can add them directly from the trace side-panel with a simple click or input them manually into the search bar.
Example: If you want to filter all traces from mysql containers - The query would be formatted as: container.image.name:mysql. For a comprehensive guide on the query syntax, see Syntax table below.
The Trace Explorer integrates dynamic filters and a versatile search functionality, to enhance your trace data analysis. You can filter out traces using specific criteria, including trace-status, workload, namespace and more, as well as limit your search to a specific time range.
groundcover natively supports setting up log pipelines using This allow for full flexibility in the processing and manipulation of traces being collected - parsing additional patterns by regex, renaming attributes, and many more.
groundcover allows full control over the retention of your traces. to learn more.
Tracing can be customized in several ways:
Sometimes there are use cases that involve complex queries and conditions for triggering a monitor. This might go beyond the built-in query logic that is provided within the groundcover logs page query language.
An example for such a use case could be the need to compare some logs to the same ones in a past period. This is not something that is regularly available for log search but can definitely be something to alert on. If the number of errors for a group of logs dramatically changes from a previous week, this could be an event to alert and investigate.
For such use cases you can harness the powerful ClickHouse SQL language to create an SQL based monitor within groundcover.
Log and Trace telemetry data is stored within a ClickHouse database.
You can directly query this data using SQL statements and create powerful monitors.
To create and test your SQL queries use the page within the groundcover app.
Select the ClickHouse@groundcover datasource with the SQL Editor option to start crafting your SQL queries
Start with show tables; to see of all the available tables to use for your queries: logs and traces would be popular choices (table names are case sensitive).
While testing your queries always use LIMIT to limit your results to a small set of data.
To apply the Grafana timeframe on your queries make sure to add the following conditions:
Logs: WHERE $__timeFilter(timestamp)
Traces: WHERE $__timeFilter(start_timestamp)
Note: When querying logs with SQL, it's crucial to use efficient filters to prevent timeouts and enhance performance. Implementing primary filters like cluster, workload, namespace, and env will significantly speed up queries. Always integrate these filters when writing your queries to avoid inefficient queries.
Traces and Logs have rich context that is normally stored in dedicated columns in json format. Accessing the context for filtering and retrieving values is a popular need when querying the data.
To get to the relevant context item, either in the attributes or tags you can use the following syntax:
WHERE string_attributes['host_name'] = 'my.host'
WHERE string_tags['cloud.name'] = 'aws'
WHERE float_attributes['hradline_count'] = 4
WHERE float_tags['container.size'] = 22.4
To use the float context ensure that the relevant attributes or tags are indeed numeric. To do that, check the relevant log in json format to see if the referenced field is not wrapped with quotes (for example, headline_count in the screenshot below)
In order to be able to use an SQL query to create a monitor you must make sure the query returns no more than a single numeric field - this is the monitored field on which the threshold is placed.
The query can also contain any number of "group by" fields that are passed to the monitor as context labels.
Here is an exmaple of an SQL query that can be used for a monitor
In this query the threshold field is a ratio between some value measured on the last week and in the last 10 minutes.
tenantID and env are the group by labels that are passed to the monitor as context labels.
Here is another query example (check the percentage of errors in a set of logs):
A single numeric value is calculated and grouped by cluster, namespace and workload
Applying an SQL query can only happens in YAML mode. You can use the following YAML template to add your query
Give your monitor a name and a description
Paste your SQL query in the expression field
Set the threshold value and the relevant operator - in this example this is "lower than" 0.5 (< 0.5)
Set your workflow name in the annotations section
Use to add the following template to your monitor.
In this example we are creating a list of Linux hosts that were sending logs in the last 24 hours and then checking if there were any logs collected from those hosts in the last 5 minutes.
This monitor can be used e.g. to catch when the host is down.
Synthetics allow you to proactively monitor the health, availability, and performance of your endpoints. By simulating requests from your infrastructure, you can verify that your critical services are reachable and returning correct data, even when no real user traffic is active.
groundcover Synthetics execute checks from your installed groundcover backend, working on inCloud deployments only.
Source: Checks run from within your backend, when having multiple groundcover backends you can select the specific backend to use. We will support region selection for running tests from specific locations.
Supported Protocols: Currently, Synthetics supports HTTP/HTTPS tests. Support for additional protocols, including gRPC, ICMP (Ping), DNS, and dedicated SSL monitors are coming soon.
Alerting: Creating a Synthetic test automatically creates a corresponding Monitor (See: ). Using monitors you can get alerted on failed synthetic tests, see: . The monitor is uneditable.
Trace Integration: We generate traces for all synthetic tests, which you can see as first-class citizens in groundcover platform. You can query these traces by using source:synthetics in traces page.
Navigate to Monitors > Synthetics and click + Create Synthetic Test .
Define the endpoint and parameters for the test.
Synthetic Test Name: A descriptive name for the test.
Target: Select the method (GET, POST, etc.) and URL. Include HTTP scheme as well, for example: https://api.groundcover.com/api/backends/list
Tip: Use Import from cURL to paste a command and auto-fill these fields.
Assertions are the rules that determine if a test passed or failed. You can add multiple assertions to a single test. If any assertion fails, the entire check is marked as failed.
Available Assertion Fields
The "Field" determines which part of the response groundcover inspects.
Assertion Operators
The "Operator" defines the logic applied to the Field.
Add custom labels, these labels will exist on traces generated by checks. You can use these labels to filter traces.
When you create a Synthetic Test, groundcover eliminates the need to manually configure separate alert rules. A Monitor is automatically generated and permanently bound to your test. See: .
Managed Logic: The monitor's threshold and conditions are derived directly from your Synthetic Test's assertions. If the test fails (e.g., status code != 200), the Monitor automatically enters a "Failing" state.
Lifecycle: This monitor handles the lifecycle of the alert, transitioning between Pending, Firing (when the test fails), and Resolved (when the test passes).
Note: To prevent configuration drift, these auto-generated monitors are read-only. You cannot edit their query logic directly; you simply edit the Synthetic Test itself.
Alerts in groundcover leverage a fully integrated Grafana interface. To learn how you can create alerts using Grafana Terraform, follow this guide.
to groundcover and navigate to the Alerts section by clicking on it on the left navigation menu.
Once in the Alerts section, click on Alerting in the inner menu on the left.
If you can't see the inner menu, click on the 3 bars next to "Home" in the upper left corner.
Click on Alert Rules
Type a name for your alert. It's recommended to use a name that will make it easy for you to understand its function later.
Select the data source:
ClickHouse: For alerts based on your traces, logs, and Kubernetes events.
Prometheus: For alerts based on metrics (includes APM metrics, infrastructure metrics, and custom metrics from your environment)
Note: You can click on "Run queries" to see the results of this query.
In the Reduce section, open on the "Function" dropdown menu and choose the type of value you want to use.
Min - the lowest value
Max - the highest value
Mean - the average of the values
Click on "+ New folder" and type a name for the folder in which this rule will be stored. You can choose any name, but it's recommended to use a name that will make it easy for you to find the relevant evaluation groups, should you want to use them again in future alerts.
Click on "+ New evaluation group" and type a name for this evaluation group. The same recommendation applies here too.
In the Evaluation interval textbox, type how often the rule should be evaluated to see if it matches the conditions set in Step 3. Then, click "Create". Note: For the Evaluation interval, use the format (number)(unit), where units are:
s = seconds
If you already have a contact point set up, simply select it from the dropdown menu at the bottom of the "Configure lables and notifications" section. If not, click on the blue "View or create contact points" link, which will open a new tab.
Click on the blue "Add contact point" button
This will get you to the Contact points screen. Then:
Type a name for the contact point
From the dropdown menu, choose which system you want to use to push the alert to.
The information required to push the alert will change based on the system you select. Follow on-screen instructions (for example, if email is selected, you'll need to enter the email address(es) for that contact.
Click "Save contact point"
You can now close this tab to go back to the alert rule screen.
Next to the link you clicked to create this new contact point, you'll find a dropdown menu, where you can select the contact point you just created.
Under "Add annotations", you have two free text boxes that give you the option to add any information that can be useful to you and/or the recipient(s) of this alert, such as a summary that reminds you of the alert's functionality or purpose, or next step instructions when this alert fires.
Once all of it is ready, you can click the blue "Save rule and exit" button on the upper right of the screen, which will bring you back to the Alert rules screen. You will now be able to see your alert, as well as its status - normal (green), pending (yellow), or firing (red), as well as the Evaluation interval (blue).
Log in to your groundcover account and navigate to the dashboard that you want to create an alert from.
Locate the Grafana panel that you want to create an alert from and click on the panel's header and select edit .
Click on the alert tab as seen in the image below. Select the Manage alerts option from the dropdown menu.
Click on the New Alert Rule button.
An alert is derived from three parts that will be configured in the screen that you are navigated to:
Expression - the query that defines the alert input itself,
Reduction - the value that should be leveraged from the aforementioned expression
Threshold
Select folder - if needed you can navigate to dashboard tab in left nav and create new folder
Select evaluation ground or type text in order to create a new group as shown below
Click "Save and Exit" on top right hand side of screen to create alert
Ensure your notification is configured to have alerts sent to end users. See "Configuring Slack Contact Point" section below if needed.
Note: Make sure to test the alert to ensure that it is working as expected. You can do this by triggering the conditions that you defined and verifying that the alert is sent to the specified notification channels.
Authorization: Bearer <YOUR_API_KEY>
Content-Type: application/json{
"name": "string",
"type": "sensor|thirdParty|rum"
}curl -L \
--request POST \
--url 'https://api.groundcover.com/api/rbac/ingestion-keys/create' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--header 'Content-Type: application/json' \
--data '{
"name": "production-k8s-sensor",
"type": "sensor"
}'curl -L \
--request POST \
--url 'https://api.groundcover.com/api/rbac/ingestion-keys/create' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--header 'Content-Type: application/json' \
--data '{
"name": "otel-collector-prod",
"type": "thirdParty",
"tags": ["otel", "production"]
}'curl -L \
--request POST \
--url 'https://api.groundcover.com/api/rbac/ingestion-keys/create' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--header 'Content-Type: application/json' \
--data '{
"name": "frontend-rum-monitoring",
"type": "rum",
"tags": ["rum", "frontend", "web"]
}'{
"id": "string",
"name": "string",
"createdBy": "string",
"creationDate": "string",
"key": "string",
"type": "string",
"tags": ["string"]
}{
"id": "12345678-1234-1234-1234-123456789abc",
"name": "production-k8s-sensor",
"createdBy": "[email protected]",
"creationDate": "2025-08-31T14:09:15Z",
"key": "gcik_AEBAAAE4_XXXXXXXXX_XXXXXXXXX_XXXXXXXX",
"type": "sensor",
"remoteConfig": true,
"tags": []
}curl -L \
--request POST \
--url 'https://api.groundcover.com/api/rbac/ingestion-keys/list' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--header 'Content-Type: application/json' \
--data '{
"name": "production-k8s-sensor"
}'Authorization: Bearer <YOUR_API_KEY>
Content-Type: application/json{
"name": "string",
"type": "sensor|thirdParty|rum",
"remoteConfig": boolean
}curl -L \
--request POST \
--url 'https://api.groundcover.com/api/rbac/ingestion-keys/list' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--header 'Content-Type: application/json' \
--data '{}'curl -L \
--request POST \
--url 'https://api.groundcover.com/api/rbac/ingestion-keys/list' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--header 'Content-Type: application/json' \
--data '{
"type": "sensor"
}'curl -L \
--request POST \
--url 'https://api.groundcover.com/api/rbac/ingestion-keys/list' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--header 'Content-Type: application/json' \
--data '{
"name": "my-sensor-key",
"remoteConfig": true
}'[
{
"id": "12345678-1234-1234-1234-123456789abc",
"name": "production-sensor-key",
"createdBy": "[email protected]",
"creationDate": "2025-08-31T11:48:18Z",
"key": "gcik_AEBAAAD4_XXXXXXXXX_XXXXXXXXX_XXXXXXXX",
"type": "sensor",
"remoteConfig": true,
"tags": []
},
{
"id": "87654321-4321-4321-4321-987654321def",
"name": "my-sensor-key",
"createdBy": "[email protected]",
"creationDate": "2025-08-31T11:48:18Z",
"key": "gcik_AEBAAAC7_XXXXXXXXX_XXXXXXXXX_XXXXXXXX",
"type": "sensor",
"remoteConfig": true,
"tags": []
},
{
"id": "abcdefab-cdef-abcd-efab-cdefabcdefab",
"name": "third-party-integration",
"createdBy": "[email protected]",
"creationDate": "2025-08-31T11:48:18Z",
"key": "gcik_AEBAAAHP_XXXXXXXXX_XXXXXXXXX_XXXXXXXX",
"type": "thirdParty",
"remoteConfig": false,
"tags": []
}
]curl -L \
--url 'https://api.groundcover.com/api/monitors/xxxx-xxxx-xxx-xxxx-xxxx' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--header 'Accept: application/json'title: 'PVC usage above threshold (90%)'
display:
header: PV usage above 90% threshold - {{ alert.labels.cluster }}, {{ alert.labels.name }}
contextHeaderLabels:
- cluster
- namespace
- env
severity: S2
measurementType: state
model:
queries:
- dataType: metrics
name: threshold_input_query
pipeline:
function:
name: last_over_time
pipelines:
- function:
name: avg_by
pipelines:
- metric: groundcover_pvc_usage_percent
args:
- cluster
- env
- name
- namespace
args:
- 1m
conditions:
- key: name
origin: root
type: string
filters:
- op: not_match
value: object-storage-cache-groundcover-incloud-clickhouse-shard.*
thresholds:
- name: threshold_1
inputName: threshold_input_query
operator: gt
values:
- 90
noDataState: OK
evaluationInterval:
interval: 1m0s
pendingFor: 1m0sLabels
Preview of available labels (click a row to see all labels)
Array of tags to associate with the key
Filter by remote configuration status
key
string
The actual ingestion key (starts with gcik_)
type
string
Key type ("sensor", "thirdParty", "rum")
remoteConfig
boolean
Whether remote configuration is enabled
tags
array
Array of tags associated with the key
display.description
string
Monitor description
severity
string
Alert severity level (e.g., "S1", "S2", "S3")
measurementType
string
Type of measurement ("state", "event")
model.queries
array
Query configurations for data retrieval
model.thresholds
array
Threshold configurations for alerting
executionErrorState
string
State when execution fails ("OK", "ALERTING")
noDataState
string
State when no data is available ("OK", "ALERTING")
evaluationInterval.interval
string
How often to evaluate the monitor
evaluationInterval.pendingFor
string
How long to wait before alerting
isPaused
boolean
Whether the monitor is currently paused





All query parameters
All bodies - for both the request and response
CPU and Memory utilization of the service and the node it is scheduled on



Follow redirects: Should the test follow 3xx responses, when disabled the test will return the 3xx response as the result set for assertions.
Allow insecure: Disables SSL/TLS certificate verification. Use this only for internal testing or self-signed certificates. Not recommended for production endpoints as it exposes you to Man-in-the-Middle attacks.
HTTP Version: Select the protocol version: one of HTTP/1.0, HTTP/1.1 or HTTP/2.0 . Default is HTTP/1.1
Timing
Interval: Frequency of the check (e.g., every 60s).
Timeout: Max duration to wait before marking the test as failed. Timeout must be less than interval.
Payload: Select the body type if your request requires data (e.g., POST/PUT).
Options: None, JSON, Text, Raw.
Headers & Auth:
Authentication (Bearer tokens, API keys) will be released soon.
Headers: You can add custom headers passing key and values.
Verifies the end of a string.
"success"
matches regex
Validates against a Regular Expression.
jsonBody matches regex user_id: \d+
exists
Checks that a field or header is present, regardless of value.
set-cookie exists in response headers
does not exist
Checks that a field is absent.
jsonBody (error_message) does not exist
is one of
Checks against a list of acceptable values.
statusCode is one of 200, 201, 202
Field
Description
statusCode
Checks the HTTP response code (e.g., 200, 404, 500).
responseHeader
Checks for the presence or value of a specific response header (e.g., Content-Type).
jsonBody
Inspects specific keys or values within a JSON response payload.
body
Checks the raw text body of the response.
responseTime
Checks the response time of the response
jsonBody
Inspects specific keys or values within a JSON response payload.
Operator
Function
Example Use Case
is equal to
Exact match. Case-sensitive.
statusCode is equal to 200
is not equal to
Ensures a value does not appear.
statusCode is not equal to 500
contains
Checks if a substring exists within the target.
body contains "error"
starts with
Verifies the beginning of a string.
"status"
ends with
Set the check interval and the pending time
Save the monitor


SELECT * FROM logs LIMIT 10;with engineStatusLastWeek as (
select string_attributes['tenantID'] tenantID, , string_attributes['env'] env, max(float_attributes['engineStatus.numCylinders']) cylinders
from logs
where timestamp >= now() - interval 7 days
and workload = 'engine-processing'
and string_attributes['tenantID'] != ''
group by tenantID, env
),
engineStatusNow as (
select string_attributes['tenantID'] tenantID, string_attributes['env'] env, min(float_attributes['engineStatus.numCylinders']) cylinders
from logs
where timestamp >= now() - interval 10 minutes
and workload = 'engine-processing'
and string_attributes['tenantID'] != ''
group by tenantID, env
)
select n.tenantID, n.env, n.cylinders/lw.cylinders AS threshold
from engineStatusNow n
left join engineStatusLastWeek lw using (tenantID)
where n.cylinders/lw.cylinders <= 0.5SELECT cluster, namespace, workload,
round( 100.0 * countIf(level = 'error') /
nullIf(count(), 0), 2 ) AS error_ratio_pct
FROM "groundcover"."logs"
WHERE timestamp >= now() - interval '10 minute' AND
namespace IN ('refurbished', 'interface') GROUP BY cluster, namespace, workloadtitle: "[SQL] Monitor name"
display:
header: Monitor description
severity: S2
measurementType: event
model:
queries:
- name: threshold_input_query
expression: "[YOUR SQL QUERY GOES HERE]"
datasourceType: clickhouse
queryType: instant
thresholds:
- name: threshold_1
inputName: threshold_input_query
operator: lt
values:
- 0.5
annotations:
[Workflow Name]: enabled
executionErrorState: OK
noDataState: OK
evaluationInterval:
interval: 3m
pendingFor: 2m
isPaused: false
title: Host not sending logs more than 5 minutes
display:
header: Host "{{host}}" is not sending logs for more than 5 minutes
severity: S2
measurementType: event
model:
queries:
- name: threshold_input_query
expression: "
WITH
(
SELECT groupArray(DISTINCT host)
FROM logs
WHERE timestamp >= now() - INTERVAL 24 HOUR
AND env_type = 'host'
) AS all_hosts
SELECT
host,
coalesce(log_count, 0) AS log_count
FROM
(
SELECT arrayJoin(all_hosts) AS host
) AS h
LEFT JOIN
(
SELECT host, count(*) AS log_count
FROM logs
WHERE timestamp >= now() - INTERVAL 5 MINUTE
AND env_type = 'host'
GROUP BY host
) AS l
USING (host)
ORDER BY host
"
datasourceType: clickhouse
queryType: instant
thresholds:
- name: threshold_1
inputName: threshold_input_query
operator: lt
values:
- 10
annotations:
{Put Your Workflow Name Here}: enabled
executionErrorState: Error
noDataState: NoData
evaluationInterval:
interval: 5m
pendingFor: 0s
isPaused: falseThen click on the blue "+ New alert rule" button in the upper right.
Click on "Select metric"
Note: Make sure you are in "Builder" view (see screenshot) to see this option.
Click on "Metrics explorer"
Start typing the name of the metric you want this alert to be based on. Note that the Metrics explorer will start displaying matches as you type, so you can find your metric even if you don't remember it exact name. You can also check out our list of Metrics & Labels.
Once you see your metric in the list, click on "Select" in that row.
Sum - the sum of all values
Count - the number of values in the result
Last - the last value
In the Threshold section, type a value and choose whether you want the alert to fire when the query result is above or below that value. You can also select a range of values.
m = minutes
h = hours
d = days
w = weeks\
In the Pending period box, type how often you want the alert to match the conditions before it fires.
Verify expression value and enter reduction and threshold values in line with your alerting expectation










This endpoint requires API Key authentication via the Authorization header.
Authorization
Yes
Bearer token with your API key
Content-Type
Yes
Must be application/json
Accept
Yes
Must be application/json
This endpoint does not require a request body for the POST method.
Field Descriptions
workflows
array
Array of workflow objects
id
string
Unique workflow identifier (UUID)
name
string
Workflow name
description
string
Workflow description
Get all workflows:
To further focus your results, you can also restrict the results to specific time windows using the time picker on the upper right of the screen.
The Query Builder is the default search option wherever search is available. Supporting advanced autocomplete of keys, values, and our discovery mode that across values in your data to teach users the data model.
The following syntaxes are available for you to use in Query Builder:
key:value
Search attributes:
Both groundcover built-ins custom attributes.
Use * for wildcard search.
Note: Multiple filters for the same key act as 'OR' conditions, whereas multiple filters for different keys act as 'AND' conditions.
namespace:prod-us
namespace:prod-*
Logs Traces K8s Events API Catalog Issues
term
Free text: Search for single-word terms. Tip: Expand your search results by using wildcards.
Exception
DivisionBy*
Logs
"term"
Phrase Search (case-insensitive):
Enclose terms within double quotes to find results containing the exact phrase.
Note: Using double quotes does not work with * wildcards.
"search term"
Filters are very easy to add and remove, using the filters menu on the left bar. You can combine filters with the Query Builder, and filters applied using the left menu will also be added to the Query Builder in text format.
Select / deselect a single filter - click on the checkbox on the left of the filter. (You can also deselect a filter by clicking the 'x' next to the text format of the filter on the search bar).
Deselect all but one filter (within a filter category, such as 'Level' or 'Format') - hover over the filter you want to leave on, then click on "ONLY".
You can switch between filters you want to leave on by hovering on another filter and clicking "ONLY" again.
To turn all other filters in that filter category back on, hover over the filter again and click "ALL".
Clear all filters within a filters category - click on the funnel icon next to the category name.
Clear all filters currently applied - click on the funnel icon next to the number of results.
Advanced Query is currently available only in the Logs section.
The following syntaxes are available for you to use in Advanced Query:
key:value
Filters: Use golden filters to narrow down your search. Note: Multiple filters for the same key act as 'OR' conditions, whereas multiple filters for different keys act as 'AND' conditions.
level:error
Logs
@key:value
Attributes: Search within the content of attributes. Note: Multiple filters for the same key act as 'OR' conditions, whereas multiple filters for different keys act as 'AND' conditions.
@transaction.id:123
Logs
term
Free text (exact match): Search for single-word terms. Tip: Expand your search results by using wildcards.
term
Find all logs with level 'error' or 'warning', in 'json' or 'logfmt' format, where the status code is 500 or 503, the request path contains '/api/v1/', and exclude logs where the user agent is 'vmagent' or 'curl':
Find logs where the bytes transferred are greater than 10000, the request method is POST, the host is not '10.1.11.65', and the namespace is 'production' or 'staging':
Find logs from pods starting with 'backend-' in 'cluster-prod', where the level is 'error', the status code is not 200 or 204, and the request protocol is 'HTTP/2.0':
Find logs where the 'user_agent' field is empty or does not exist, the request path starts with '/admin', and the status code is greater than 400:
Find logs in 'json' format from hosts starting with 'ip-10-1-', where the level is 'unknown', the container name contains 'redis', excluding logs with bytes transferred equal to 0:
Find logs where the time is '18/Sep/2024:07:25:46 +0000', the request method is GET, the status code is less than 200 or greater than 299, and the host is '10.1.11.65':
Find logs where the level is 'info', the format is 'clf', the namespace is 'production', the pod name contains 'web', and exclude logs where the user agent is 'vmagent':
Find logs where the container name does not exist, the cluster is 'cluster-prod', the request path starts with '/internal', and the request protocol is 'HTTP/1.1':
Find logs where the bytes transferred are greater than 5000, the request method is PUT or DELETE, the status code is 403 or 404, and the host is not '10.1.11.65':
Find logs where the format is 'unknown', the level is not 'error', the user agent is 'curl', and the pod name starts with 'test-':
By default, the search bar will be displayed in Query Builder mode. Use the button on the right of the search bar to switch back and forth between the Query Builder and Advanced Query.
The groundcover platform generates 100% of its metrics from the actual data. There are no sample rates or complex interpolations to make up for partial coverage. Our measurements represent the real, complete flow of data in your environment.
allows us to construct the majority of the metrics on the very node where the raw transactions are recorded. This means the raw data is turned into numbers the moment it becomes possible - removing the need for storing or sending it elsewhere.
Metrics are stored in groundcover's victoria-metrics deployment, ensuring top-notch performance on every scale.
LLM Observability is the practice of monitoring, analyzing, and troubleshooting interactions with Large Language Models (LLMs) across distributed systems. It focuses on capturing data regarding prompt content, response quality, performance latency, and token costs.
groundcover provides a unified view of your GenAI traffic by combining two powerful data collection methods: zero-instrumentation eBPF tracing and native OpenTelemetry ingestion.
Learn how to create and configure monitors using the Wizard, Monitor Catalog, or Import options. The following guide will help you set up queries, thresholds, and alert routing for effective monitoring.
You can either create monitors using our web application following this guide, or use our API, see: or use our Terraform provider, see: .
In the Monitors section (left navigation bar), navigate to the Issues page or the Monitor List page to create a new Monitor. Click on the “Create Monitor” button at the top right and select one of the following options from the dropdown:
curl -L \
--request POST \
--url 'https://api.groundcover.com/api/workflows/list' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--header 'Accept: */*'{
"workflows": [
{
"id": "12345678-1234-1234-1234-123456789abc",
"name": "ms-teams-alerts-workflow",
"description": "Sends an API to MS Teams alerts endpoint",
"created_by": "[email protected]",
"creation_time": "2025-07-02T09:42:13.334103Z",
"triggers": [
{
"type": "alert"
}
],
"interval": 0,
"last_execution_time": null,
"last_execution_status": null,
"providers": [
{
"type": "webhook",
"id": "provider123456789abcdef",
"name": "teams-integration",
"installed": true
},
{
"type": "webhook",
"id": null,
"name": "backup-teams-integration",
"installed": false
}
],
"workflow_raw_id": "teams-webhook",
"workflow_raw": "id: teams-webhook\ndescription: Sends an API to MS Teams alerts endpoint\ntriggers:\n- type: alert\n filters:\n - key: annotations.ms-teams-alerts-workflow\n value: enabled\nname: ms-teams-alerts-workflow\n...",
"revision": 11,
"last_updated": "2025-07-03T08:57:09.881806Z",
"invalid": false,
"last_execution_started": null
},
{
"id": "87654321-4321-4321-4321-987654321def",
"name": "webhook-alerts-workflow",
"description": "Workflow for sending alerts to custom webhook",
"created_by": "[email protected]",
"creation_time": "2025-06-19T12:49:37.630392Z",
"triggers": [
{
"type": "alert"
}
],
"interval": 0,
"last_execution_time": null,
"last_execution_status": null,
"providers": [
{
"type": "webhook",
"id": "webhook987654321fedcba",
"name": "custom-webhook",
"installed": true
}
],
"workflow_raw_id": "webhook-alerts",
"workflow_raw": "id: webhook-alerts\ndescription: Workflow for sending alerts to custom webhook\n...",
"revision": 2,
"last_updated": "2025-06-19T12:51:24.643393Z",
"invalid": false,
"last_execution_started": null
}
]
}level:(error or warning) format:(json or logfmt) status_code:(500 or 503) @request.path:~"/api/v1/" NOT user_agent:(vmagent or curl)bytes:>10000 @request.method:POST NOT host:10.1.11.65 namespace:(production or staging)pod:~backend- cluster:cluster-prod level:error NOT status_code:(200 or 204) @request.protocol:"HTTP/2.0"user_agent:"" @request.path:~"/admin" status_code:>400format:json host:~"ip-10-1-" level:unknown container:~redis NOT bytes:0@time:"18/Sep/2024:07:25:46 +0000" @request.method:GET (status_code:<200 status_code:>299) host:10.1.11.65level:info format:clf namespace:production pod:~web NOT user_agent:vmagentcontainer:"" cluster:cluster-prod @request.path:~"/internal" @request.protocol:"HTTP/1.1"bytes:>5000 @request.method:(PUT or DELETE) status_code:(403 or 404) NOT host:10.1.11.65format:unknown NOT level:error user_agent:curl pod:~test-created_by
string
Email of the workflow creator
creation_time
string
Workflow creation timestamp (ISO 8601)
triggers
array
Array of trigger configurations
triggers[].type
string
Trigger type (e.g., "alert")
interval
number
Execution interval (typically 0 for alert-triggered workflows)
last_execution_time
string/null
Last execution timestamp
last_execution_status
string/null
Last execution status ("success", "error", etc.)
providers
array
Array of integration provider configurations
providers[].type
string
Provider type (see provider types below)
providers[].id
string/null
Provider configuration ID
providers[].name
string
Provider display name
providers[].installed
boolean
Whether provider is installed and configured
workflow_raw_id
string
Raw workflow identifier
workflow_raw
string
Complete YAML workflow definition
revision
number
Workflow version number
last_updated
string
Last update timestamp (ISO 8601)
invalid
boolean
Whether workflow configuration is invalid
last_execution_started
string/null
When last execution started
Logs
-key:value
Exclude: Specify terms or filters to omit from your search; applies to each distinct search.
-key:value
-term
-"search term"
Logs Traces K8s Events API Catalog Issues
*:value
Search all attributes:
Search any attribute for a value, you can use double quotes for exact match and wildcards.
*:error
*:"POST /api/search"
*:erro*
Logs Traces Issues
Logs
" "
Phrase Search (case-insensitive): Enclose terms within double quotes to find results containing the exact phrase.
"search term"
Logs
~
Wildcard: Search for partial matches. Note: Wildcards must be added before the search term or value, and will always be treated as a partial match search.
key:~val
@key:~val
~term
~"search phrase"
Logs
NOT
!
Exclude: Specify terms or filters to omit from your search; applies to each distinct search.
!key:value
NOT @key:value
NOT term
!"search term"
Logs
key:""
Identify cases where key does not exist or is empty
pid:""
Logs
key:=#
key:>#
key:<#
Search for key:pair values where the value is equal, greater than, or smaller than, a specified number.
threadPriority:>5
Logs
key:(val1 or val2)
Search for key:value pairs using a list of values.
level:(error or info)
Logs
query1 or query2
Use OR operator to display matches on either queries
level:error or format:json
Logs
query1 and query2
Use AND operator to display matches on both queries
level:error and format:json
Logs
"Search term prefix"*
Exact phrase prefix search
"Error 1064 (42"*
Logs



In the world of excessive data, it's important to have a rule of thumb for knowing where to start looking. For application metrics, we rely on our golden signals.
The following metrics are generated for each resource being aggregated:
Requests per second (RPS)
Errors rate
Latencies (p50 and p95)
The golden signals are then displayed in two important ways: Workload and Resource aggregations.
Resource aggregations are highly granularity metrics, providing insights into individual APIs.
Workload aggregations are designed to show an overview of each service, enabling a higher level inspection. These are constructed using all of the resources recorded for each service.
groundcover allows full control over the retention of your metrics. Learn more here.
Below you will find the full list of our APM metrics, as well as the labels we export for each. These labels are designed with high granularity in mind for maximal insight depth. All of the metrics listed are available out of the box after installing groundcover, without any further setup.
clusterId
Name identifier of the K8s cluster
region
Cloud provider region name
namespace
K8s namespace
workload_name
K8s workload (or service) name
groundcover uses a set of internal labels which are not relevant in most use-cases. Find them interesting? Let us know over Slack!
issue_id entity_id resource_id query_id aggregation_id parent_entity_id perspective_entity_id perspective_entity_is_external perspective_entity_issue_id perspective_entity_name perspective_entity_namespace perspective_entity_resource_id
groundcover_resource_total_counter
total amount of resource requests
groundcover_resource_error_counter
total amount of requests with error status codes
groundcover_resource_issue_counter
total amount of requests which were flagged as issues
groundcover_resource_success_counter
total amount of resource requests with OK status codes
groundcover_workload_total_counter
total amount of requests handled by the workload
groundcover_workload_error_counter
total amount of requests handled by the workload with error status codes
groundcover_workload_issue_counter
total amount of requests handled by the workload which were flagged as issues
groundcover_workload_success_counter
total amount of requests handled by the workload with OK status codes
groundcover_pvc_read_bytes_total
total amount of bytes read by the workload from the PVC
groundcover_pvc_write_bytes_total
total amount of bytes written by the workload to the PVC
groundcover_pvc_reads_total
total amount of read operations done by the workload from the PVC
groundcover_pvc_writes_total
total amount of write operations done by the workload to the PVC
groundcover_client_offset
client last message offset (for producer the last offset produced, for consumer the last requested offset)
groundcover_workload_client_offset
client last message offset (for producer the last offset produced, for consumer the last requested offset), aggregated by workload
groundcover_calc_lagged_messages
current lag in messages
groundcover_workload_calc_lagged_messages
current lag in messages, aggregated by workload
groundcover automatically detects and traces LLM API calls without requiring SDKs, wrappers, or code modification.
The sensor captures traffic at the kernel level, extracting key data points and transforming requests into structured spans and metrics. This allows for instant visibility into third-party providers without altering application code. This method captures:
Payloads: Full prompt and response bodies (supports redaction).
Usage: Token counts (input, output, total).
Metadata: Model versions, temperature, and parameters.
Performance: Latency and completion time.
Status: Error messages and finish reasons.
In addition to auto-detection, groundcover supports the ingestion of traces generated by manual OpenTelemetry instrumentation.
If your applications are already instrumented using OpenTelemetry SDKs (e.g., using the OpenTelemetry Python or JavaScript instrumentation for OpenAI/LangChain), groundcover will seamlessly ingest, process, and visualize these spans alongside your other telemetry data.
When groundcover captures traffic via eBPF, it automatically transforms the data into structured spans that adhere to the OpenTelemetry GenAI Semantic Conventions.
This standardization allows LLM traces to correlate with existing application telemetry. Below are the attributes captured for each eBPF-generated LLM span:
gen_ai.system
The Generative AI provider
openai
gen_ai.request.model
The model name requested by the client
gpt-4
gen_ai.response.model
The name of the model that generated the response
gpt-4-0613
gen_ai.response.usage.input_tokens
Tokens consumed by the input (prompt)
groundcover automatically generates rate, errors, duration and usage metrics from the LLM traces. These metrics adhere to OpenTelemetry GenAI conventions and are enriched with Kubernetes context (cluster, namespace, workload, etc).
groundcover_workload_gen_ai_response_usage_input_tokens
Input token count, aggregated by K8s workload
groundcover_workload_gen_ai_response_usage_output_tokens
Output token count, aggregated by K8s workload
groundcover_workload_gen_ai_response_usage_total_tokens
Total token usage, aggregated by K8s workload
groundcover_gen_ai_response_usage_input_tokens
Global input token count (cluster-wide)
groundcover_gen_ai_response_usage_output_tokens
Global output token count (cluster-wide)
groundcover_gen_ai_response_usage_total_tokens
Global total token usage (cluster-wide)
Available Labels:
Metrics can be filtered by: workload, namespace, cluster, gen_ai_request_model, gen_ai_system, client, server, and status_code.
LLM payloads often contain sensitive data (PII, secrets). By default, groundcover collects full payloads to aid in debugging. You can configure the agent to obfuscate specific fields within the prompts or responses using the httphandler configuration in your values.yaml.
See Sensitive data obfuscation for full details on obfuscation in groundcover.
This configuration will obfuscate request prompts, while keeping metadata like model, tokens, etc
This configuration will obfuscate response data, while keeping metadata like model, tokens, etc
groundcover currently supports the following providers via auto-detection:
OpenAI (Chat Completion API)
Anthropic (Chat Completion API)
AWS Bedrock APIs
The Monitor Wizard is a guided, user-friendly approach to creating and configuring monitors tailored to your observability needs. By breaking down the process into simple steps, it ensures consistency and accuracy.
Set up the basic information for the monitor.
Add a title for the monitor. The title will appear in notifications and in the Monitor List page.
Give the Monitor a clear, short name, that describes its function at a high level.
Examples:
“Workload High API Error Rate”
“Workload Pods High Memory”
Description (Optional):
Add a description for your monitor, The description will appear when viewing the monitor details, you can also use this for your alerts.
Select the data source, build the query and define thresholds for the monitor.
Data Source (Required):
Select the type of data (Metrics, Infra Metrics, Logs, Traces, or Events).
Query Functionality:
Choose how to process the data (e.g., average, count).
Add aggregation clauses if applicable, you MUST use aggregations if you want to add labels to your issues.
Examples: cluster, workload, container_name
Time Window (Required):
Specify the period over which data is aggregated.
Example: “Over the last 5 minutes.”
Threshold Conditions (Required):
Define when the monitor triggers. You can use:
Greater Than - Trigger when the value exceeds X.
Lower Than - Trigger when the value falls below X.
Visualization Type (Optional):
Preview data using Stacked Bar or Line Chart for better clarity while building the monitor.
Customize how the Monitor’s Issues will appear. This section also includes a live preview of the way it will appear in the Issues page.
Issue Header (required):
Define a name for issues that this Monitor will raise. It's useful to use labels that can include information from the query.
For example, adding {{ alert.labels.statusCode }} to the header will inject the status code to the name of the issue - this becomes especially useful when one Monitor raises multiple issues and you want to quickly understand their content without having to open each one.
Examples:
“HTTP API Error {{ alert.labels.status_code }}” -> HTTP API Error 500
“Workload {{ alert.labels.workload }} Pod Restart” -> Workload frontend Pod Restart
“{{ alert.labels.customer }} APIs High Latency” -> org.com APIs High Latenct
Severity (required):
Use severity to categorize alerts by importance.
Select a severity level (S1-S4).
Context Labels (optional):
If you want to use labels here you MUST add them to query aggregation.
These Labels will be displayed and filterable in Monitors>Issues page.
We recommend using up to 5 Labels for best experience.
Organize and categorize monitors, you can use these to route issues using advanced workflows.
Labels (optional):
Add key-value pairs for metadata.
Define how often the monitor evaluates its conditions.
Evaluation Interval (Required):
Specify how often the monitor evaluates the query
Example: “Evaluate every 1 minute.”
Pending Period (Required):
This ensures that transient conditions do not trigger alerts, reducing false positives. For example, setting this to 10 minutes ensures the condition must persist for at least 10 minutes before firing.
If the query conditions were not met during the evaluation duration, the issue's pending period will reset to normal.
Example: “Wait for 10 minute before alerting."
Set up how issues from this monitor will be routed.
Select Workflow (Optional):
Route alerts to existing workflows only, this means that other workflows will not process them. Use this to send alerts for a critical application such as Slack or PagerDuty.
No Routing (Optional):
This means that any workflow (without filters), will process the issue.
Whenever possible, use our carefully crafted monitors from the Monitor Catalog. This will save you time, ensure the Monitors are built effectively, and help you align your alerting strategy with best practices. If you can't find one that perfectly matches your needs, use them as your starting point and edit their properties to customize them to your needs.
Give the Monitor a clear, short name, that describes its function at a high level.
“Workload High API Error Rate”
“Workload Pods High Memory”
Choose a clear name for the Issue header, offering a bit more details and going into a more specific description of the monitor name. A Header is a specific property of an issue, so you can add templated dynamic values here. For example, you can use dynamic label values in the header name.
“HTTP API Error {{ alert.labels.status_code }}”,
“Workload {{ alert.labels.workload }} Pod Restart”
“{{ alert.labels.customer }} APIs High Latency”.
We recommend using up to 3 ResourceHeaderLabels. These labels should give your team the context of what is the subject of the issue.
span_name , pod_name
We recommend using up to 3 ContextHeaderLabels. These labels should give your team the context of where the issue happened.
cluster, namespace , workload
This is an advanced feature, please use it with caution.
In the "Import Bulk Monitors" you can add multiple monitors using an array of Monitors that follows the Monitor YAML structure.
Example of importing multiple monitors
Click on "Create Monitors" to create them.
Authorization
Bearer <YOUR_API_KEY>
Your groundcover API key
Content-Type
application/json
Request body format
X-Backend-Id
<YOUR_BACKEND_ID>
Your backend identifier
start
string
Yes
Start time in ISO 8601 UTC format
end
string
Yes
End time in ISO 8601 UTC format
sources
array
No
nodes
array
Array of node objects
nodes[].uid
string
Unique identifier for the node
nodes[].name
string
Node name
nodes[].cluster
string
Cluster name
eq
Equals
ne
Not equals
gt
Greater than
lt
Less than
contains
Contains substring
Authorization
Yes
Bearer token with your API key
X-Backend-Id
Yes
Your backend identifier
Content-Type
Yes
Must be application/json
Accept
Yes
Must be application/json
sources
Array
No
Filter by data sources (empty array for all sources)
The response contains an array of clusters with detailed resource usage and metadata.
clusters
Array
Array of cluster objects
totalCount
Integer
Total number of clusters
name
String
Cluster name
env
String
Environment (e.g., "prod", "ga", "beta", "alpha", "latest")
creationTimestamp
String
When the cluster was created (ISO 8601)
cloudProvider
String
Cloud provider (e.g., "AWS", "GCP", "Azure")
CPU Metrics
cpuUsage
Integer
Current CPU usage in millicores
cpuLimit
Integer
CPU limits set on resources in millicores
cpuAllocatable
Integer
Total allocatable CPU in millicores
cpuRequest
Integer
Total CPU requests in millicores
Memory Metrics
memoryUsage
Integer
Current memory usage in bytes
memoryLimit
Integer
Memory limits set on resources in bytes
memoryAllocatable
Integer
Total allocatable memory in bytes
memoryRequest
Integer
Total memory requests in bytes
Pod Information
pods
Object
Pod counts by status (e.g., {"Running": 157, "Succeeded": 4})
Get a list of Kubernetes deployments with status information, replica counts, and operational conditions for a specified time range.
POST /api/k8s/v2/deployments/list
This endpoint requires API Key authentication via the Authorization header.
The request body requires a time range and supports filtering by fields:
Parameters
Field Descriptions
Common Condition Types
Common Condition Reasons
Get deployments for a specific time range:
Get deployments from specific namespaces:
Use ISO 8601 UTC format for timestamps
Typical time ranges: 1-24 hours for operational monitoring
Maximum recommended range: 7 days
Format: YYYY-MM-DDTHH:MM:SS.sssZ
Retrieve a list of Kubernetes workloads with their performance metrics, resource usage, and metadata.
This endpoint requires API Key authentication via the Authorization header.
The response contains a paginated list of workloads with their metrics and metadata.
To retrieve all workloads, use pagination by incrementing the skip parameter:
To fetch all results programmatically:
Start with skip=0 and limit=100 (or your preferred page size)
Check the total field in the response
Continue making requests, incrementing skip by your limit value
Example calculation:
If total is 6314 and limit is 100
You need ⌈6314/100⌉ = 64 requests
Last request: skip=6300, limit=100 (returns 14 items)
httphandler:
obfuscationConfig:
keyValueConfig:
enabled: true
mode: "ObfuscateSpecificValues"
specificKeys:
- "messages"
- "inputText"
- "prompt"httphandler:
obfuscationConfig:
keyValueConfig:
enabled: true
mode: "ObfuscateSpecificValues"
specificKeys:
- "choices"
- "output"
- "content"
- "outputs"
- "results"
- "generation"monitors:
- title: K8s Cluster High Memory Requests Monitor
display:
header: K8s Cluster High Memory Requests
description: Alerts when a K8s Cluster's total Container Memory Requests exceeds 90% of the Allocatable Memory of all the Nodes for 5 minutes
contextHeaderLabels:
- env
- cluster
severity: S1
measurementType: state
model:
queries:
- name: threshold_input_query
expression: avg_over_time( (((sum(groundcover_node_rt_mem_requests_bytes{}) by (cluster, env)) / (sum(groundcover_node_rt_allocatable_mem_bytes{}) by (cluster, env))) * 100)[5m] )
queryType: instant
datasourceType: prometheus
thresholds:
- name: threshold_1
inputName: threshold_input_query
operator: gt
values:
- 90
noDataState: OK
evaluationInterval:
interval: 1m
pendingFor: 0s
- title: K8s PVC Pending For 5 Minutes Monitor
display:
header: K8s PVC Pending Over 5 Minutes
description: This monitor triggers an alert when a PVC remains in a Pending state for more than 5 minutes.
contextHeaderLabels:
- cluster
- namespace
- persistentvolumeclaim
severity: S2
measurementType: state
model:
queries:
- name: threshold_input_query
expression: last_over_time(max(groundcover_kube_persistentvolumeclaim_status_phase{phase="Pending"}) by (cluster, namespace, persistentvolumeclaim)[1m])
queryType: instant
datasourceType: prometheus
thresholds:
- name: threshold_1
inputName: threshold_input_query
operator: gt
values:
- 0
executionErrorState: OK
noDataState: OK
evaluationInterval:
interval: 1m
pendingFor: 5m{
"key": "cluster",
"type": "string",
"origin": "root",
"filters": [
{
"op": "eq",
"value": "cluster-name"
}
]
}curl -L \
--request POST \
--url 'https://api.groundcover.com/api/k8s/v2/nodes/info-with-resources' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--header 'Content-Type: application/json' \
--header 'X-Backend-Id: <YOUR_BACKEND_ID>' \
--data '{
"start": "2025-01-27T12:00:00.000Z",
"end": "2025-01-27T14:00:00.000Z",
"sources": [
{
"key": "cluster",
"type": "string",
"origin": "root",
"filters": [
{
"op": "eq",
"value": "my-cluster"
}
]
}
],
"limit": 100,
"nameFilter": ""
}'{
"nodes": [
{
"uid": "node-uid",
"name": "node-name",
"cluster": "cluster-name",
"env": "environment-name",
"creationTimestamp": "2025-01-01T10:00:00Z",
"labels": {
"kubernetes.io/arch": "amd64",
"kubernetes.io/os": "linux",
"node.kubernetes.io/instance-type": "t3.medium"
},
"addresses": [
{
"type": "InternalIP",
"address": "10.0.1.100"
},
{
"type": "ExternalIP",
"address": "203.0.113.100"
}
],
"nodeInfo": {
"kubeletVersion": "v1.24.0",
"kubeProxyVersion": "v1.24.0",
"operatingSystem": "linux",
"architecture": "amd64",
"containerRuntimeVersion": "containerd://1.6.0",
"kernelVersion": "5.4.0-91-generic",
"osImage": "Ubuntu 20.04.3 LTS"
},
"capacity": {
"cpu": "2",
"memory": "8Gi",
"pods": "110"
},
"allocatable": {
"cpu": "1940m",
"memory": "7Gi",
"pods": "110"
},
"usage": {
"cpu": "500m",
"memory": "3Gi"
},
"ready": true,
"conditions": [
{
"type": "Ready",
"status": "True",
"lastTransitionTime": "2025-01-01T10:05:00Z",
"reason": "KubeletReady",
"message": "kubelet is posting ready status"
}
]
}
]
}{
"start": "2025-01-27T12:00:00.000Z",
"end": "2025-01-27T14:00:00.000Z",
"limit": 100
}{
"start": "2025-01-27T12:00:00.000Z",
"end": "2025-01-27T14:00:00.000Z",
"sources": [
{
"key": "cluster",
"type": "string",
"origin": "root",
"filters": [{"op": "eq", "value": "production-cluster"}]
}
],
"limit": 100
}POST /api/k8s/v3/clusters/listcurl 'https://api.groundcover.com/api/k8s/v3/clusters/list' \
-H 'accept: application/json' \
-H 'authorization: Bearer <YOUR_API_KEY>' \
-H 'content-type: application/json' \
-H 'X-Backend-Id: <YOUR_BACKEND_ID>' \
--data-raw '{"sources":[]}'{
"clusters": [
{
"name": "production-cluster",
"env": "prod",
"cpuUsage": 126640,
"cpuLimit": 289800,
"cpuAllocatable": 302820,
"cpuRequest": 187975,
"cpuUsageAllocatablePercent": 41.82,
"cpuRequestAllocatablePercent": 62.07,
"cpuUsageRequestPercent": 67.37,
"cpuUsageLimitPercent": 43.70,
"cpuLimitAllocatablePercent": 95.70,
"memoryUsage": 242994409472,
"memoryLimit": 604262891520,
"memoryAllocatable": 1227361431552,
"memoryRequest": 495549677568,
"memoryUsageAllocatablePercent": 19.80,
"memoryRequestAllocatablePercent": 40.38,
"memoryUsageRequestPercent": 49.04,
"memoryUsageLimitPercent": 40.21,
"memoryLimitAllocatablePercent": 49.23,
"nodesCount": 6,
"pods": {
"Running": 109,
"Succeeded": 3
},
"issueCount": 1,
"creationTimestamp": "2021-11-01T14:37:31Z",
"cloudProvider": "AWS",
"kubernetesVersion": "v1.30.14-eks-931bdca"
}
],
"totalCount": 116
}POST /api/k8s/v3/workloads/list100
gen_ai.response.usage.output_tokens
Tokens generated in the response
100
gen_ai.response.usage.total_tokens
Total token usage for the interaction
200
gen_ai.response.finish_reason
Reason the model stopped generating
stop ; length
gen_ai.response.choice_count
Target number of candidate completions
3
gen_ai.response.system_fingerprint
Fingerprint to track backend environment changes
fp_44709d6fcb
gen_ai.response.tools_used
Number of tools used in API call
2
gen_ai.request.temperature
The temperature setting
0.0
gen_ai.request.max_tokens
Maximum tokens allowed for the request
100
gen_ai.request.top_p
The top_p sampling setting
1.0
gen_ai.request.stream
Boolean indicating if streaming was enabled
false
gen_ai.response.message_id
Unique ID of the message created by the server
gen_ai.error.code
The error code for the response
gen_ai.error.message
A human-readable description of the error
gen_ai.error.type
Describes a class of error the operation ended with
timeout; java.net.UnknownHostException; server_certificate_invalid; 500
gen_ai.operation.name
The name of the operation being performed
chat; generate_content; text_completion
gen_ai.request.message_count
Count of messages in API response
1
gen_ai.request.system_prompt
Boolean flag whether system prompt was used in request prompts
true
gen_ai.request.tools_used
Boolean flag whether any tools were used in requests
true
Source filters (e.g., cluster filters)
limit
integer
No
Maximum number of nodes to return (default: 100)
nodes[].env
string
Environment name
nodes[].creationTimestamp
string
Node creation time in ISO 8601 format
nodes[].labels
object
Node labels key-value pairs
nodes[].addresses
array
Node IP addresses (internal/external)
nodes[].nodeInfo
object
Node system information
nodes[].capacity
object
Total node resource capacity
nodes[].allocatable
object
Allocatable resources (capacity minus system reserved)
nodes[].usage
object
Current resource usage
nodes[].ready
boolean
Node readiness status
nodes[].conditions
array
Node condition details
kubernetesVersion
String
Kubernetes version
nodesCount
Integer
Number of nodes in the cluster
issueCount
Integer
Number of issues detected
cpuUsageAllocatablePercent
Float
CPU usage as percentage of allocatable
cpuRequestAllocatablePercent
Float
CPU requests as percentage of allocatable
cpuUsageRequestPercent
Float
CPU usage as percentage of requests
cpuUsageLimitPercent
Float
CPU usage as percentage of limits
cpuLimitAllocatablePercent
Float
CPU limits as percentage of allocatable
memoryUsageAllocatablePercent
Float
Memory usage as percentage of allocatable
memoryRequestAllocatablePercent
Float
Memory requests as percentage of allocatable
memoryUsageRequestPercent
Float
Memory usage as percentage of requests
memoryUsageLimitPercent
Float
Memory usage as percentage of limits
memoryLimitAllocatablePercent
Float
Memory limits as percentage of allocatable
pod_name
K8s pod name
container_name
K8s container name
container_image
K8s container image name
remote_namespace
Remote K8s namespace (other side of the communication)
remote_service_name
Remote K8s service name (other side of the communication)
remote_container_name
Remote K8s container name (other side of the communication)
type
The protocol in use (HTTP, gRPC, Kafka, DNS etc.)
role
Role in the communication (client or server)
clustered_path
HTTP / gRPC aggregated resource path (e.g. /metrics/*)
http, grpc
method
HTTP / gRPC method (e.g GET)
http, grpc
response_status_code
Return status code of a HTTP / gPRC request (e.g. 200 in HTTP)
http, grpc
dialect
SQL dialect (MySQL or PostgreSQL)
mysql, postgresql
response_status
Return status code of a SQL query (e.g 42P01 for undefined table)
mysql, postgresql
client_type
Kafka client type (Fetcher / Producer)
kafka
topic
Kafka topic name
kafka
partition
Kafka partition identifier
kafka
error_code
Kafka return status code
kafka
query_type
type of DNS query (e.g. AAAA)
dns
response_return_code
Return status code of a DNS resolution request (e.g. Name Error)
dns
method_name, method_class_name
Method code for the operation
amqp
response_method_name, response_method_class_name
Method code for the operation's response
amqp
exit_code
K8s container termination exit code
container_state, container_crash
state
K8s container current state (Running, Waiting or Terminated)
container_state
state_reason
K8s container state transition reason (e.g CrashLoopBackOff or OOMKilled)
container_state
crash_reason
K8s container crash reason (e.g Error, OOMKilled)
container_crash
pvc_name
K8s PVC name
storage
groundcover_resource_latency_seconds
resource latency [sec]
groundcover_workload_latency_seconds
resource latency across all of the workload APIs [sec]
groundcover_pvc_read_latency
latency of read operation by the workload from the PVC, in microseconds
groundcover_pvc_write_latency
latency of write operation by the workload to the PVC, in microseconds
groundcover_calc_lag_seconds
current lag in time [sec]
groundcover_workload_calc_lag_seconds
current lag in time, aggregated by workload [sec]


Within Range - Trigger when the value is between X and Y.
Outside Range - Trigger when the value is not between X and Y.
Example: “Trigger if disk space usage is greater than 10%.”




sources
array
No
Source filters
creationTime
string
Deployment creation timestamp in ISO 8601 format
cluster
string
Kubernetes cluster name
env
string
Environment name (e.g., "prod", "staging")
available
integer
Number of available replicas
desired
integer
Number of desired replicas
ready
integer
Number of ready replicas
conditions
array
Array of deployment condition objects
conditions[].type
string
Condition type (e.g., "Available", "Progressing")
conditions[].status
string
Condition status ("True", "False", "Unknown")
conditions[].lastProbeTime
string/null
Last time the condition was probed
conditions[].lastHeartbeatTime
string/null
Last time the condition was updated
conditions[].lastTransitionTime
string
Last time the condition transitioned
conditions[].reason
string
Machine-readable reason for the condition
conditions[].message
string
Human-readable message explaining the condition
warnings
array
Array of warning messages (usually empty)
id
string
Unique identifier for the deployment
resourceVersion
integer
Kubernetes resource version
Authorization
Yes
Bearer token with your API key
X-Backend-Id
Yes
Your backend identifier
Content-Type
Yes
Must be application/json
Accept
Yes
Must be application/json
start
string
Yes
Start time in ISO 8601 UTC format (e.g., "2025-08-24T07:21:36.944Z")
end
string
Yes
End time in ISO 8601 UTC format (e.g., "2025-08-24T08:51:36.944Z")
namespaces
array
No
deployments
array
Array of deployment objects
name
string
Deployment name
namespace
string
Kubernetes namespace
workloadName
string
Associated workload name
Available
Deployment has minimum availability
Progressing
Deployment is making progress towards desired state
MinimumReplicasAvailable
Deployment has minimum number of replicas available
NewReplicaSetAvailable
New ReplicaSet has successfully progressed
Array of namespace names to filter by (e.g., ["groundcover", "default"])
Integer
No
0
Number of workloads to skip for pagination
order
String
No
"desc"
Sort order: "asc" or "desc"
sortBy
String
No
"rps"
Field to sort by (e.g., "rps", "cpuUsage", "memoryUsage")
sources
Array
No
[]
Filter by data sources
namespace
String
Kubernetes namespace
workload
String
Workload name
kind
String
Kubernetes resource kind (e.g., "ReplicaSet", "StatefulSet", "DaemonSet")
resourceVersion
Integer
Kubernetes resource version
ready
Boolean
Whether the workload is ready
podsCount
Integer
Number of pods in the workload
p50
Float
50th percentile response time in seconds
p95
Float
95th percentile response time in seconds
p99
Float
99th percentile response time in seconds
rps
Float
Requests per second
errorRate
Float
Error rate as a decimal (e.g., 0.004 = 0.4%)
cpuLimit
Integer
CPU limit in millicores (0 = no limit)
cpuUsage
Float
Current CPU usage in millicores
memoryLimit
Integer
Memory limit in bytes (0 = no limit)
memoryUsage
Integer
Current memory usage in bytes
issueCount
Integer
Number of issues detected
Stop when skip >= total
Authorization
Yes
Bearer token with your API key
X-Backend-Id
Yes
Your backend identifier
Content-Type
Yes
Must be application/json
Accept
Yes
Must be application/json
conditions
Array
No
[]
Filter conditions for workloads
limit
Integer
No
100
Maximum number of workloads to return (1-1000)
total
Integer
Total number of workloads available
workloads
Array
Array of workload objects
uid
String
Unique identifier for the workload
envType
String
Environment type (e.g., "k8s")
env
String
Environment name (e.g., "prod", "ga", "alpha")
cluster
String
Kubernetes cluster name
skip
{
"start": "2025-08-24T07:21:36.944Z",
"end": "2025-08-24T08:51:36.944Z",
"namespaces": ["groundcover"],
"sources": []
}{
"deployments": [
{
"name": "string",
"namespace": "string",
"workloadName": "string",
"creationTime": "2023-08-30T18:27:01Z",
"cluster": "string",
"env": "string",
"available": 1,
"desired": 1,
"ready": 1,
"conditions": [
{
"type": "string",
"status": "string",
"lastProbeTime": null,
"lastHeartbeatTime": null,
"lastTransitionTime": "string",
"reason": "string",
"message": "string"
}
],
"warnings": [],
"id": "string",
"resourceVersion": 0
}
]
}curl 'https://api.groundcover.com/api/k8s/v2/deployments/list' \
-H 'accept: application/json' \
-H 'authorization: Bearer <YOUR_API_KEY>' \
-H 'content-type: application/json' \
-H 'X-Backend-Id: <YOUR_BACKEND_ID>' \
--data-raw '{"start":"2025-08-24T07:21:36.944Z","end":"2025-08-24T08:51:36.944Z","namespaces":[],"sources":[]}'curl 'https://api.groundcover.com/api/k8s/v2/deployments/list' \
-H 'accept: application/json' \
-H 'authorization: Bearer <YOUR_API_KEY>' \
-H 'content-type: application/json' \
-H 'X-Backend-Id: <YOUR_BACKEND_ID>' \
--data-raw '{"start":"2025-08-24T07:21:36.944Z","end":"2025-08-24T08:51:36.944Z","namespaces":["groundcover","monitoring"],"sources":[]}'{
"deployments": [
{
"name": "db-manager",
"namespace": "groundcover",
"workloadName": "db-manager",
"creationTime": "2023-08-30T18:27:01Z",
"cluster": "karma-cluster",
"env": "prod",
"available": 1,
"desired": 1,
"ready": 1,
"conditions": [
{
"type": "Available",
"status": "True",
"lastProbeTime": null,
"lastHeartbeatTime": null,
"lastTransitionTime": "2025-08-22T06:18:27Z",
"reason": "MinimumReplicasAvailable",
"message": "Deployment has minimum availability."
},
{
"type": "Progressing",
"status": "True",
"lastProbeTime": null,
"lastHeartbeatTime": null,
"lastTransitionTime": "2023-08-30T18:27:01Z",
"reason": "NewReplicaSetAvailable",
"message": "ReplicaSet \"db-manager-867bc8f5b8\" has successfully progressed."
}
],
"warnings": [],
"id": "f3b1f4a5-f38a-4c63-a7c0-9333fcbf1906",
"resourceVersion": 747039184
}
]
}curl 'https://api.groundcover.com/api/k8s/v3/workloads/list' \
-H 'accept: application/json' \
-H 'authorization: Bearer <YOUR_API_KEY>' \
-H 'content-type: application/json' \
-H 'X-Backend-Id: <YOUR_BACKEND_ID>' \
--data-raw '{"conditions":[],"limit":100,"order":"desc","skip":0,"sortBy":"rps","sources":[]}'{
"total": 6314,
"workloads": [
{
"uid": "824b00bf-db68-47b5-8a53-9abd98bf7c0a",
"envType": "k8s",
"env": "ga",
"cluster": "akamai-lk41ok",
"namespace": "groundcover-incloud",
"workload": "groundcover-incloud-vector",
"kind": "ReplicaSet",
"resourceVersion": 651723275,
"ready": true,
"podsCount": 5,
"p50": 0.0005824280087836087,
"p95": 0.005730729550123215,
"p99": 0.0327172689139843,
"rps": 5526.0027359781125,
"errorRate": 0,
"cpuLimit": 0,
"cpuUsage": 50510.15252730218,
"memoryLimit": 214748364800,
"memoryUsage": 46527352832,
"issueCount": 0
}
]
}# First batch (0-99)
curl 'https://api.groundcover.com/api/k8s/v3/workloads/list' \
-H 'accept: application/json' \
-H 'authorization: Bearer <YOUR_API_KEY>' \
-H 'content-type: application/json' \
-H 'X-Backend-Id: <YOUR_BACKEND_ID>' \
--data-raw '{"conditions":[],"limit":100,"order":"desc","skip":0,"sortBy":"rps","sources":[]}'
# Second batch (100-199)
curl 'https://api.groundcover.com/api/k8s/v3/workloads/list' \
-H 'accept: application/json' \
-H 'authorization: Bearer <YOUR_API_KEY>' \
-H 'content-type: application/json' \
-H 'X-Backend-Id: <YOUR_BACKEND_ID>' \
--data-raw '{"conditions":[],"limit":100,"order":"desc","skip":100,"sortBy":"rps","sources":[]}'
# Continue incrementing skip by 100 until you reach the total countExecute PromQL queries against groundcover metrics data. Two endpoints are available: instant queries for point-in-time values and range queries for time-series data over specific periods.
GET /api/prometheus/api/v1/query
Execute an instant PromQL query to get metric values at a single point in time.
POST /api/metrics/query-range
Execute a PromQL query over a time range to get time-series data.
Both endpoints require API Key authentication via the Authorization header.
GET /api/prometheus/api/v1/query
Headers
Query Parameters
Understanding the Time Parameter
The time parameter specifies exactly one timestamp at which to evaluate your PromQL expression. This is NOT a time range:
With time: "What was the disk usage at 2025-10-21T09:21:44.398Z?"
Without time: "What is the disk usage right now?"
Important: This is different from range queries which return time-series data over a period.
Instant vs Range Queries - Key Differences
Example Comparison:
Instant: time=2025-10-21T09:00:00Z → Returns disk usage at exactly 9:00 AM
Range: start=2025-10-21T08:00:00Z&end=2025-10-21T09:00:00Z → Returns hourly disk usage trend
Practical Example:
The endpoint returns a Prometheus-compatible response format:
Response Fields
POST /api/metrics/query-range
Headers
Request Body
Request Parameters
The range query returns a custom format optimized for time-series data:
Response Fields
Each data point in the velocity array contains:
Timestamp: Unix timestamp as integer
Value: Metric value as string
Current Values (No Time Parameter)
Get current average disk space usage (evaluated at request time):
Historical Point-in-Time Values
Get disk usage at a specific moment in the past:
Note: This returns the disk usage value at exactly 2025-10-21T09:21:44.398Z, not a range from that time until now.
24-Hour Disk Usage Trend
Get disk usage over the last 24 hours with 30-minute resolution:
High-Resolution CPU Monitoring
Monitor CPU usage over 1 hour with 1-minute resolution:
Use appropriate time ranges: Avoid querying excessively large time ranges
Choose optimal step sizes: Balance resolution with performance
Filter early: Use label filters to reduce data volume
Aggregate wisely: Use grouping to reduce cardinality
Use RFC3339 format: Always use ISO 8601 timestamps
Account for timezones: Timestamps are in UTC
Align step boundaries: Choose steps that align with data collection intervals
Handle clock skew: Allow for small time differences in distributed systems
Concurrent queries: Limit concurrent requests to avoid overwhelming the API
Query complexity: Complex queries may take longer and consume more resources
Data retention: Historical data availability depends on your retention policy
Response
Single value with timestamp
Array of timestamp-value pairs
data.result[].value
array
[timestamp, value] tuple with Unix timestamp and string value
stats.seriesFetched
string
Number of time series processed
stats.executionTimeMsec
number
Query execution time in milliseconds
step
string
Yes
Query resolution step (e.g., "30s", "1m", "5m", "1h")
query
string
Yes
PromQL query string (URL encoded)
time
string
No
Single point in time to evaluate the query (RFC3339 format). Default: current time
Purpose
Get value at one specific moment
Get time-series data over a period
Time Parameter
Single timestamp (time)
Start and end timestamps (start, end)
Result
One data point
Multiple data points over time
Use Case
"What is the current CPU usage?"
status
string
Query execution status ("success" or "error")
data.resultType
string
Type of result data ("vector", "matrix", "scalar", "string")
data.result
array
Array of metric results
data.result[].metric
object
Metric labels as key-value pairs
promql
string
Yes
PromQL query expression
start
string
Yes
Range start time in RFC3339 format
end
string
Yes
velocities
array
Array of time-series data objects
velocities[].velocity
array
Array of [timestamp, value] data points
velocities[].metric
object
Metric labels as key-value pairs
promql
string
Echo of the executed PromQL query
"Show me CPU usage over the last hour"
Range end time in RFC3339 format
Authorization: Bearer <YOUR_API_KEY>
Accept: application/json# Get current value (no time parameter)
# Returns: {"value":[1761040224,"18.45"]} - timestamp is "right now"
curl '...query=avg(groundcover_node_rt_disk_space_used_percent{})'
# Get historical value (with time parameter)
# Returns: {"value":[1761038504,"18.44"]} - timestamp is exactly what you specified
curl '...query=avg(groundcover_node_rt_disk_space_used_percent{})&time=2025-10-21T09:21:44.398Z'{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {},
"value": [1761038504.398, "18.442597642017"]
}
]
},
"stats": {
"seriesFetched": "12",
"executionTimeMsec": 0
}
}Authorization: Bearer <YOUR_API_KEY>
Accept: application/json
Content-Type: application/json{
"promql": "string",
"start": "string",
"end": "string",
"step": "string"
}{
"velocities": [
{
"velocity": [
[1760950800, "21.534558665381155"],
[1760952600, "21.532404350483848"],
[1760954400, "21.57135294176692"]
],
"metric": {}
}
],
"promql": "avg(groundcover_node_rt_disk_space_used_percent{})"
}curl 'https://app.groundcover.com/api/prometheus/api/v1/query?query=avg%28groundcover_node_rt_disk_space_used_percent%7B%7D%29' \
-H 'accept: application/json' \
-H 'authorization: Bearer <YOUR_API_KEY>'curl 'https://app.groundcover.com/api/prometheus/api/v1/query?query=avg%28groundcover_node_rt_disk_space_used_percent%7B%7D%29&time=2025-10-21T09%3A21%3A44.398Z' \
-H 'accept: application/json' \
-H 'authorization: Bearer <YOUR_API_KEY>'curl 'https://app.groundcover.com/api/metrics/query-range' \
-H 'accept: application/json' \
-H 'authorization: Bearer <YOUR_API_KEY>' \
-H 'content-type: application/json' \
--data-raw '{
"promql": "avg(groundcover_node_rt_disk_space_used_percent{})",
"start": "2025-10-20T09:19:22.475Z",
"end": "2025-10-21T09:19:22.475Z",
"step": "1800s"
}'curl 'https://app.groundcover.com/api/metrics/query-range' \
-H 'accept: application/json' \
-H 'authorization: Bearer <YOUR_API_KEY>' \
-H 'content-type: application/json' \
--data-raw '{
"promql": "avg(groundcover_cpu_usage_percent) by (cluster)",
"start": "2025-10-21T08:00:00.000Z",
"end": "2025-10-21T09:00:00.000Z",
"step": "1m"
}'While we strongly suggest building monitors using our Wizard or Catalog, groundcover supports building and editing your Monitors using YAML. If you choose to do so, the following will provide you the necessary definitions.
In this section, you'll find a breakdown of the key fields used to define and configure Monitors within the groundcover platform. Each field plays a critical role in how a Monitor behaves, what data it tracks, and how it responds to specific conditions. Understanding these fields will help you set up effective Monitors to track performance, detect issues, and provide timely alerts.
Below is a detailed explanation of each field, along with examples to illustrate their usage, ensuring your team can manage and respond to incidents efficiently.
["cluster", "namespace", "pod_name"]
Labels
A set of pre-defined labels that were set to Issues related to the selected Monitor. Labels can be static, or dynamic using a Monitor's query results.
team: sre_team
ExecutionErrorState
Defines the actions that take place when a Monitor encounters query execution errors.
Valid options are Alerting, OK and Error.
When Alerting is set, query execution errors will result in a firing issue.
When Error is set, query execution errors will result in an error state.
NoDataState
This defines what happens when queries in the Monitor return empty datasets.
Valid options are: NoData , Alerting, OK
When NoData is set, monitor instance's state will be No Data.
When Alerting
Interval
Defines how frequently the Monitor evaluates the conditions. Common intervals could be 1m, 5m, etc.
PendingFor
Defines the period of consecutive intervals where threshold condition must be met to trigger the alert.
Trigger
Defines the condition under which the Monitor fires. This is the definition of threshold for the Monitor, with op - operator and value .
op: gt, value: 5
Model
Describes the queries, thresholds and data processing of the Monitor. It can have the following fields:
Queries: List of one or more queries to run, this can be either SQL over ClickHouse, PromQL over VictoriaMetrics, SqlPipeline. Each query will have a name for reference in the monitor.
Thresholds: This is the threshold of your Monitor, a threshold has a name, inputName for data input, operator one of gt , lt , within_range, outside_range
measurementType
Describe how will we present issues of this Monitor. Some Monitors count events, and some a state. And we will display them differently in our dashboards.
state - Will present issues in line chart.
event - Will present issues in bar chart, counting events.
Title
A string that defines the human-readable name of the Monitor. The title is what you will see in the list of all existing Monitors in the Monitors section.
Description
Additional information about the Monitor.
Severity
When triggered, this will show the severity level of the Monitor's issue. You can set any severity you want here.
s1 for Critical
s2 for High
s3 for Medium
s4 for Low
Header
This is the header of the generated issues from the Monitor.
A short string describing the condition that is being monitored. You can also use this as a pattern using labels from you query.
“HTTP API Error {{ alert.labels.return_code}}”
ResourceHeaderLabels
A list of labels that help you identify the resources that are related to the Monitor. This appear as a secondary header in all Issues tables across the platform.
["span_name", "kind"] for monitors on protocol issues.
ContextHeaderLabels
A list of contextual labels that help you identify the location of the issue. This appears as a subset of the Issue’s labels, and is displayed on all Issues tables across the platform.
This endpoint requires API Key authentication via the Authorization header.
The request body supports filtering, pagination, sorting, and time range parameters:
conditions
array
No
Array of filter conditions for monitors (empty array returns all)
limit
integer
No
Maximum number of monitors to return (default: 200)
skip
integer
No
"lastFiringStart"
Last time monitor started firing alerts
"title"
Monitor title alphabetically
"severity"
Monitor severity level
"createdAt"
Monitor creation date
"updatedAt"
Last modification date
"state"
Current monitor state
The conditions array accepts filter objects for targeted monitor queries:
The endpoint returns a JSON object containing an array of detailed monitor configurations:
Top-Level Fields
hasMonitors
boolean
Whether any monitors exist in the system
monitors
array
Array of monitor configuration objects
Monitor Object Fields
uuid
string
Unique monitor identifier
title
string
Monitor display name
description
string
Monitor description
severity
string
Alert severity level ("S1", "S2", "S3", "S4")
Get all monitors with default pagination:
Get monitors within a specific time window:
Get the second page of results:
Get monitors sorted by newest first:
title: MySQL Query Errors Monitor
display:
header: MySQL Error {{ alert.labels.statusCode }}
description: This monitor detects MySQL Query errors.
resourceHeaderLabels:
- span_name
- role
contextHeaderLabels:
- cluster
- namespace
- workload
severity: S3
measurementType: event
model:
queries:
- name: threshold_input_query
dataType: traces
sqlPipeline:
selectors:
- key: _time
origin: root
type: string
processors:
- op: toStartOfInterval
args:
- 1 minutes
alias: bucket_timestamp
- key: statusCode
origin: root
type: string
alias: statusCode
- key: span_name
origin: root
type: string
alias: span_name
- key: cluster
origin: root
type: string
alias: cluster
- key: namespace
origin: root
type: string
alias: namespace
- key: role
origin: root
type: string
alias: role
- key: workload
origin: root
type: string
alias: workload
- key: "*"
origin: root
type: string
processors:
- op: count
alias: logs_total
groupBy:
- key: _time
origin: root
type: string
processors:
- op: toStartOfInterval
args:
- 1 minutes
- key: statusCode
origin: root
type: string
alias: statusCode
- key: span_name
origin: root
type: string
alias: span_name
- key: cluster
origin: root
type: string
alias: cluster
- key: namespace
origin: root
type: string
alias: namespace
- key: role
origin: root
type: string
alias: role
- key: workload
origin: root
type: string
alias: workload
orderBy:
- selector:
key: bucket_timestamp
origin: root
type: string
direction: ASC
limit:
filters:
operator: and
conditions:
- filters:
- op: match
value: mysql
key: eventType
origin: root
type: string
- filters:
- op: match
value: error
key: status
origin: root
type: string
- filters:
- op: match
value: eBPF
key: source
origin: root
type: string
instantRollup: 1 minutes
thresholds:
- name: threshold_1
inputName: threshold_input_query
operator: gt
values:
- 0
executionErrorState: OK
noDataState: OK
evaluationInterval:
interval: 1m
pendingFor: 0s
labels:
team: infratitle: gRPC API Errors Monitor
display:
header: gRPC API Error {{ alert.labels.statusCode }}
description: This monitor detects gRPC API errors by identifying responses with a non-zero status code.
resourceHeaderLabels:
- span_name
- role
contextHeaderLabels:
- cluster
- namespace
- workload
severity: S3
measurementType: event
model:
queries:
- name: threshold_input_query
dataType: traces
sqlPipeline:
selectors:
- key: _time
origin: root
type: string
processors:
- op: toStartOfInterval
args:
- 1 minutes
alias: bucket_timestamp
- key: statusCode
origin: root
type: string
alias: statusCode
- key: span_name
origin: root
type: string
alias: span_name
- key: cluster
origin: root
type: string
alias: cluster
- key: namespace
origin: root
type: string
alias: namespace
- key: role
origin: root
type: string
alias: role
- key: workload
origin: root
type: string
alias: workload
- key: "*"
origin: root
type: string
processors:
- op: count
alias: logs_total
groupBy:
- key: _time
origin: root
type: string
processors:
- op: toStartOfInterval
args:
- 1 minutes
- key: statusCode
origin: root
type: string
alias: statusCode
- key: span_name
origin: root
type: string
alias: span_name
- key: cluster
origin: root
type: string
alias: cluster
- key: namespace
origin: root
type: string
alias: namespace
- key: role
origin: root
type: string
alias: role
- key: workload
origin: root
type: string
alias: workload
orderBy:
- selector:
key: bucket_timestamp
origin: root
type: string
direction: ASC
limit:
filters:
operator: and
conditions:
- filters:
- op: match
value: grpc
key: eventType
origin: root
type: string
- filters:
- op: ne
value: "0"
key: statusCode
origin: root
type: string
- filters:
- op: match
value: error
key: status
origin: root
type: string
- filters:
- op: match
value: eBPF
key: source
origin: root
type: string
instantRollup: 1 minutes
thresholds:
- name: threshold_1
inputName: threshold_input_query
operator: gt
values:
- 0
executionErrorState: OK
noDataState: OK
evaluationInterval:
interval: 1m
pendingFor: 0stitle: High Error Log Rate Monitor
severity: S4
display:
header: High Log Error Rate
description: This monitor will trigger an alert when we have a rate of error logs.
resourceHeaderLabels:
- workload
contextHeaderLabels:
- cluster
- namespace
evaluationInterval:
interval: 1m
pendingFor: 0s
model:
queries:
- name: threshold_input_query
dataType: logs
sqlPipeline:
selectors:
- key: _time
origin: root
type: string
processors:
- op: toStartOfInterval
args:
- 1 minutes
alias: bucket_timestamp
- key: workload
origin: root
type: string
alias: workload
- key: namespace
origin: root
type: string
alias: namespace
- key: cluster
origin: root
type: string
alias: cluster
- key: "*"
origin: root
type: string
processors:
- op: count
alias: logs_total
groupBy:
- key: _time
origin: root
type: string
processors:
- op: toStartOfInterval
args:
- 1 minutes
- key: workload
origin: root
type: string
alias: workload
- key: namespace
origin: root
type: string
alias: namespace
- key: cluster
origin: root
type: string
alias: cluster
orderBy:
- selector:
key: bucket_timestamp
origin: root
type: string
direction: ASC
limit:
filters:
conditions:
- filters:
- op: match
value: error
key: level
origin: root
type: string
operator: and
instantRollup: 1 minutes
thresholds:
- name: threshold_1
inputName: threshold_input_query
operator: gt
values:
- 150
noDataState: OK
measurementType: eventAuthorization: Bearer <YOUR_API_KEY>
Content-Type: application/json
Accept: text/event-stream{
"conditions": [],
"limit": 200,
"skip": 0,
"maxInstances": 10,
"order": "desc",
"sortBy": "lastFiringStart",
"start": "2025-10-12T08:19:18.582Z",
"end": "2025-10-12T09:19:18.582Z"
}{
"conditions": [
{
"field": "severity",
"operator": "equals",
"value": "S1"
},
{
"field": "state",
"operator": "in",
"values": ["Alerting", "Normal"]
}
]
}{
"hasMonitors": true,
"monitors": [
{
"uuid": "string",
"title": "string",
"description": "string",
"severity": "string",
"measurementType": "string",
"state": "string",
"alertingCount": 0,
"model": {
"queries": [],
"thresholds": []
},
"interval": {
"interval": "string",
"for": "string"
},
"executionErrorState": "string",
"noDataState": "string",
"isPaused": false,
"createdBy": 0,
"createdByEmail": "string",
"createdAt": "string",
"updatedAt": "string",
"lastStateStart": "string",
"lastFiringStart": "string",
"firstFiringStart": "string",
"lastResolved": "string",
"minEvaluationDurationSeconds": 0.0,
"avgEvaluationDurationSeconds": 0.0,
"maxEvaluationDurationSeconds": 0.0,
"lastEvaluationError": "string",
"lastEvaluationTimestamp": "string",
"silenced": false,
"fullySilenced": false,
"silence_uuids": []
}
]
}curl -L \
--request POST \
--url 'https://api.groundcover.com/api/monitors/summary/query' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--header 'Content-Type: application/json' \
--header 'Accept: text/event-stream' \
--data-raw '{
"conditions": [],
"limit": 200,
"skip": 0,
"maxInstances": 10,
"order": "desc",
"sortBy": "lastFiringStart"
}'curl -L \
--request POST \
--url 'https://api.groundcover.com/api/monitors/summary/query' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--header 'Content-Type: application/json' \
--header 'Accept: text/event-stream' \
--data-raw '{
"conditions": [],
"limit": 100,
"skip": 0,
"maxInstances": 10,
"order": "desc",
"sortBy": "lastFiringStart",
"start": "2025-10-12T08:00:00.000Z",
"end": "2025-10-12T10:00:00.000Z"
}'curl -L \
--request POST \
--url 'https://api.groundcover.com/api/monitors/summary/query' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--header 'Content-Type: application/json' \
--header 'Accept: text/event-stream' \
--data-raw '{
"conditions": [],
"limit": 50,
"skip": 50,
"maxInstances": 10,
"order": "desc",
"sortBy": "title"
}'curl -L \
--request POST \
--url 'https://api.groundcover.com/api/monitors/summary/query' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--header 'Content-Type: application/json' \
--header 'Accept: text/event-stream' \
--data-raw '{
"conditions": [],
"limit": 100,
"skip": 0,
"maxInstances": 10,
"order": "desc",
"sortBy": "createdAt"
}'{
"hasMonitors": true,
"monitors": [
{
"uuid": "12345678-1234-1234-1234-123456789abc",
"title": "Example_Latency_Monitor",
"description": "",
"template": "Example_Latency_Monitor",
"severity": "S2",
"measurementType": "event",
"header": "Example_Latency_Monitor",
"resourceLabels": ["workload"],
"contextLabels": ["namespace", "cluster"],
"category": "",
"interval": {
"interval": "5m0s",
"for": "1m0s"
},
"model": {
"queries": [
{
"dataType": "traces",
"name": "threshold_input_query",
"sqlPipeline": {
"selectors": [
{
"key": "workload",
"origin": "root",
"type": "string",
"alias": "workload"
},
{
"key": "namespace",
"origin": "root",
"type": "string",
"alias": "namespace"
},
{
"key": "cluster",
"origin": "root",
"type": "string",
"alias": "cluster"
}
]
},
"instantRollup": "5 minutes"
}
],
"reducers": null,
"thresholds": [
{
"name": "threshold_1",
"inputName": "threshold_input_query",
"operator": "gt",
"values": [502]
}
],
"query": "SELECT workload, namespace, cluster, count(*) AS logs_total FROM traces WHERE (start_timestamp < toStartOfInterval(NOW(), INTERVAL '5 MINUTE') AND start_timestamp >= (toStartOfInterval(NOW(), INTERVAL '5 MINUTE') - INTERVAL '5 minutes')) GROUP BY workload, namespace, cluster",
"type": "traces"
},
"reducer": "",
"trigger": {
"op": "gt",
"value": 502
},
"labelsMapping": {
"owner": "example-user"
},
"executionErrorState": "",
"noDataState": "OK",
"isPaused": false,
"createdBy": 12345,
"createdByEmail": "[email protected]",
"createdAt": "2025-03-14T20:42:36.949847Z",
"updatedAt": "2025-09-21T12:17:00.130801Z",
"relativeTimerange": {},
"silences": [],
"monitorId": "12345678-1234-1234-1234-123456789abc",
"state": "Alerting",
"lastStateStart": "0001-01-01T00:00:00Z",
"lastFiringStart": null,
"firstFiringStart": null,
"lastResolved": null,
"alertingCount": 11,
"silenced": false,
"fullySilenced": false,
"silence_uuids": [],
"minEvaluationDurationSeconds": 7.107210216,
"avgEvaluationDurationSeconds": 7.1096896183047775,
"maxEvaluationDurationSeconds": 7.119120884,
"lastEvaluationError": "",
"lastEvaluationTimestamp": "2025-10-12T09:15:50Z"
}
]
}When OK is set, query execution errors will do neither of the above. This is the default setting
PendingAlertingWhen OK is set, monitor instance's state will be Normal. This is the default setting.
Number of monitors to skip for pagination (default: 0)
maxInstances
integer
No
Maximum instances per monitor result (default: 10)
order
string
No
Sort order: "asc" or "desc" (default: "desc")
sortBy
string
No
Field to sort by (see sorting options below)
start
string
No
Start time for filtering (ISO 8601 format)
end
string
No
End time for filtering (ISO 8601 format)
measurementType
string
Monitor type ("state", "event")
state
string
Current monitor state ("Normal", "Alerting", "Paused")
alertingCount
integer
Number of active alerts
model
object
Monitor configuration with queries and thresholds
interval
object
Evaluation timing configuration
executionErrorState
string
State when execution fails
noDataState
string
State when no data is available
isPaused
boolean
Whether monitor is currently paused
createdBy
integer
Creator user ID
createdByEmail
string
Creator email address
createdAt
string
Creation timestamp (ISO 8601)
updatedAt
string
Last update timestamp (ISO 8601)
lastStateStart
string
When current state began
lastFiringStart
string
When monitor last started alerting
firstFiringStart
string
When monitor first started alerting
lastResolved
string
When monitor was last resolved
minEvaluationDurationSeconds
float
Fastest query execution time
avgEvaluationDurationSeconds
float
Average query execution time
maxEvaluationDurationSeconds
float
Slowest query execution time
lastEvaluationError
string
Last execution error message
lastEvaluationTimestamp
string
Last evaluation timestamp
silenced
boolean
Whether monitor is silenced
fullySilenced
boolean
Whether monitor is completely silenced
silence_uuids
array
Array of silence rule identifiers
type clusterId region node_name
type clusterId region name namespace
clusterId workload_name namespace container_name remote_service_name remote_namespace remote_is_external availability_zone region remote_availability_zone remote_region
Notes:
is_loopback and remote_is_external are special labels that indicate the remote service is either the same service as the recording side (loopback) or resides in an external network, e.g managed service outside of the cluster (external).
In both cases the remote_service_name and the remote_namespace labels will be empty
type resource condition status clusterId region namespace workload_name deployment unit
type clusterId region namespace node_name workload_name pod_name container_name container_image
type clusterId region namespace node_name workload_name pod_name container_name container_image
type clusterId region namespace node_name workload_name pod_name container_name container_image
type clusterId region namespace node_name workload_name pod_name container_name container_image
type clusterId region namespace node_name workload_name pod_name container_name container_image
clusterId env region host_name cloud_provider env_type
clusterId env region host_name cloud_provider env_type
clusterId env region host_name cloud_provider env_type Optional: device_name
clusterId env region host_name cloud_provider env_type Optional: device_name
clusterId env region host_name cloud_provider env_type device_name file_system mountpoint
clusterId env region host_name cloud_provider env_type
clusterId env region host_name cloud_provider env_type device
We also use a set of internal labels which are not relevant in most use-cases. Find them interesting?
issue_id entity_id resource_id query_id aggregation_id parent_entity_id
In the lists below, we describe error and issue counters. Every issue flagged by the platform is an error; but not every error is flagged as an issue.
groundcover_node_used_disk_space
Current used disk space in the current node
Bytes
groundcover_node_free_disk_space
Free disk space in the current node
Bytes
groundcover_node_total_disk_space
Total disk space in the current node
Bytes
groundcover_node_used_percent_disk_space
Percentage of used disk space in the current node
Percentage
groundcover_pvc_usage_percent
Percentage of used Persistent Volume Claim (PVC) storage
Percentage
groundcover_pvc_read_bytes_total
Total bytes read by the workload from the Persistent Volume Claim (PVC)
Bytes
groundcover_pvc_write_bytes_total
Total bytes written by the workload to the Persistent Volume Claim (PVC)
Bytes
groundcover_pvc_reads_total
Total read operations performed by the workload from the Persistent Volume Claim (PVC)
Number
groundcover_pvc_writes_total
Total write operations performed by the workload to the Persistent Volume Claim (PVC)
Number
groundcover_pvc_read_latency
Latency of read operations from the Persistent Volume Claim (PVC) by the workload
Seconds
groundcover_pvc_write_latency
Latency of write operations to the Persistent Volume Claim (PVC) by the workload
Seconds
groundcover_pvc_read_latency_count
Count of read operations latency for the Persistent Volume Claim (PVC)
Number
groundcover_pvc_read_latency_sum
Sum of read operation latencies for the Persistent Volume Claim (PVC)
Seconds
groundcover_pvc_read_latency_summary
Summary of read operations latency for the Persistent Volume Claim (PVC)
Milliseconds
groundcover_pvc_write_latency_count
Count of write operations sampled for latency on the Persistent Volume Claim (PVC)
Number
groundcover_pvc_write_latency_sum
Sum of write operation latencies for the Persistent Volume Claim (PVC)
Seconds
groundcover_pvc_write_latency_summary
Summary of write operations latency for the Persistent Volume Claim (PVC)
Milliseconds
protocolroleserver_portencryptiontransport_protocolis_loopbackis_cross_az means the traffic was sent and/or received between two different availability zones. This is a helpful flag to quickly identify this special kind of communication.The actual zones are detailed in the availability_zone and remote_availability_zone labels
groundcover_network_connections_closed_total
Total connections closed by the workload
Number
groundcover_network_connections_opened_failed_total
Total number of failed network connection attempts by the workload
Number
groundcover_network_connections_refused_failed_total
Connections attempts refused per workload
Number
groundcover_network_connections_opened_refused_total
Total number of network connections refused by the workload
Number
groundcover_network_rx_ops_total
Total number of read operations issued by the workload
Number
groundcover_network_tx_ops_total
Total number of write operations issued by the workload
Number
groundcover_kube_daemonset_status_number_available
Number of available Pods for the DaemonSet
Number
groundcover_kube_daemonset_status_number_misscheduled
Number of Pods running on nodes they should not be scheduled on
Number
groundcover_kube_daemonset_status_number_ready
Number of ready Pods for the DaemonSet
Number
groundcover_kube_daemonset_status_number_unavailable
Number of unavailable Pods for the DaemonSet
Number
groundcover_kube_daemonset_status_observed_generation
Most recent generation observed for the DaemonSet
Number
groundcover_kube_daemonset_status_updated_number_scheduled
Number of Pods updated and scheduled by the DaemonSet
Number
groundcover_kube_deployment_created
Creation timestamp of the Deployment
Seconds
groundcover_kube_deployment_metadata_generation
Sequence number representing a specific generation of the Deployment
Number
groundcover_kube_deployment_spec_paused
Whether the Deployment is paused
Number
groundcover_kube_deployment_spec_replicas
Desired number of replicas for the Deployment
Number
groundcover_kube_deployment_spec_strategy_rollingupdate_max_unavailable
Maximum number of unavailable Pods during a rolling update for the Deployment
Number
groundcover_kube_deployment_status_condition
Current condition of the Deployment (labeled by type and status)
Number
groundcover_kube_deployment_status_observed_generation
Most recent generation observed for the Deployment
Number
groundcover_kube_deployment_status_replicas
Number of replicas for the Deployment
Number
groundcover_kube_deployment_status_replicas_available
Number of available replicas for the Deployment
Number
groundcover_kube_deployment_status_replicas_ready
Number of ready replicas for the Deployment
Number
groundcover_kube_deployment_status_replicas_unavailable
Number of unavailable replicas for the Deployment
Number
groundcover_kube_deployment_status_replicas_updated
Number of updated replicas for the Deployment
Number
groundcover_kube_horizontalpodautoscaler_spec_max_replicas
Maximum number of replicas configured for the HPA
Number
groundcover_kube_horizontalpodautoscaler_spec_min_replicas
Minimum number of replicas configured for the HPA
Number
groundcover_kube_horizontalpodautoscaler_spec_target_metric
Configured HPA target metric value
Number
groundcover_kube_horizontalpodautoscaler_status_condition
Current condition of the Horizontal Pod Autoscaler (labeled by type and status)
Number
groundcover_kube_horizontalpodautoscaler_status_current_replicas
Current number of replicas managed by the HPA
Number
groundcover_kube_horizontalpodautoscaler_status_desired_replicas
Desired number of replicas as calculated by the HPA
Number
groundcover_kube_horizontalpodautoscaler_status_target_metric
Current observed value of the HPA target metric
Number
groundcover_kube_job_complete
Whether the Job has completed successfully
Number
groundcover_kube_job_failed
Whether the Job has failed
Number
groundcover_kube_job_spec_completions
Desired number of successfully finished Pods for the Job
Number
groundcover_kube_job_spec_parallelism
Desired number of Pods running in parallel for the Job
Number
groundcover_kube_job_status_active
Number of actively running Pods for the Job
Number
groundcover_kube_job_status_completion_time
Completion time of the Job as Unix timestamp
Seconds
groundcover_kube_job_status_failed
Number of failed Pods for the Job
Number
groundcover_kube_job_status_start_time
Start time of the Job as Unix timestamp
Seconds
groundcover_kube_job_status_succeeded
Number of succeeded Pods for the Job
Number
groundcover_kube_node_created
Creation timestamp of the Node
Seconds
groundcover_kube_node_spec_taint
Node taint information (labeled by key, value and effect)
Number
groundcover_kube_node_spec_unschedulable
Whether a node can schedule new pods
Number
groundcover_kube_node_status_allocatable
The amount of resources allocatable for pods (after reserving some for system daemons)
Number
groundcover_kube_node_status_capacity
The total amount of resources available for a node
Number
groundcover_kube_node_status_condition
The condition of a cluster node
Number
groundcover_kube_persistentvolume_capacity_bytes
Capacity of the PersistentVolume
Bytes
groundcover_kube_persistentvolume_status_phase
Current phase of the PersistentVolume
Number
groundcover_kube_persistentvolumeclaim_access_mode
Access mode of the PersistentVolumeClaim
Number
groundcover_kube_persistentvolumeclaim_status_phase
Current phase of the PersistentVolumeClaim
Number
groundcover_kube_pod_container_resource_limits
The number of requested limit resource by a container. It is recommended to use the `kube_pod_resource_limits` metric exposed by kube-scheduler instead, as it is more precise.
Number
groundcover_kube_pod_container_resource_requests
The number of requested request resource by a container. It is recommended to use the `kube_pod_resource_requests` metric exposed by kube-scheduler instead, as it is more precise.
Number
groundcover_kube_pod_container_status_last_terminated_exitcode
The last termination exit code for the container
Number
groundcover_kube_pod_container_status_last_terminated_reason
The last termination reason for the container
Number
groundcover_kube_pod_container_status_ready
Describes whether the containers readiness check succeeded
Number
groundcover_kube_pod_container_status_restarts_total
The number of container restarts per container
Number
groundcover_kube_pod_container_status_running
Describes whether the container is currently in running state
Number
groundcover_kube_pod_container_status_terminated
Describes whether the container is currently in terminated state
Number
groundcover_kube_pod_container_status_terminated_reason
Describes the reason the container is currently in terminated state
Number
groundcover_kube_pod_container_status_waiting
Describes whether the container is currently in waiting state
Number
groundcover_kube_pod_container_status_waiting_reason
Describes the reason the container is currently in waiting state
Number
groundcover_kube_pod_created
Creation timestamp of the Pod
Seconds
groundcover_kube_pod_init_container_resource_limits
The number of CPU cores requested limit by an init container
Bytes
groundcover_kube_pod_init_container_resource_requests
Requested resources by init container (labeled by resource and unit)
Number
groundcover_kube_pod_init_container_resource_requests_memory_bytes
Requested memory by init containers
Bytes
groundcover_kube_pod_init_container_status_last_terminated_reason
The last termination reason for the init container
Number
groundcover_kube_pod_init_container_status_ready
Describes whether the init containers readiness check succeeded
Number
groundcover_kube_pod_init_container_status_restarts_total
The number of restarts for the init container
Number
groundcover_kube_pod_init_container_status_running
Describes whether the init container is currently in running state
Number
groundcover_kube_pod_init_container_status_terminated
Describes whether the init container is currently in terminated state
Number
groundcover_kube_pod_init_container_status_terminated_reason
Describes the reason the init container is currently in terminated state
Number
groundcover_kube_pod_init_container_status_waiting
Describes whether the init container is currently in waiting state
Number
groundcover_kube_pod_init_container_status_waiting_reason
Describes the reason the init container is currently in waiting state
Number
groundcover_kube_pod_spec_volumes_persistentvolumeclaims_readonly
Whether the PersistentVolumeClaim is mounted as read-only in the Pod
Number
groundcover_kube_pod_status_phase
The pods current phase
Number
groundcover_kube_pod_status_ready
Describes whether the pod is ready to serve requests
Number
groundcover_kube_pod_status_scheduled
Describes the status of the scheduling process for the pod
Number
groundcover_kube_pod_status_unschedulable
Whether the Pod is unschedulable
Number
groundcover_kube_pod_tolerations
Pod tolerations configuration
Number
groundcover_kube_replicaset_spec_replicas
Desired number of replicas for the ReplicaSet
Number
groundcover_kube_replicaset_status_fully_labeled_replicas
Number of fully labeled replicas for the ReplicaSet
Number
groundcover_kube_replicaset_status_observed_generation
Most recent generation observed for the ReplicaSet
Number
groundcover_kube_replicaset_status_ready_replicas
Number of ready replicas for the ReplicaSet
Number
groundcover_kube_replicaset_status_replicas
Number of replicas for the ReplicaSet
Number
groundcover_kube_resourcequota
Resource quota information (labeled by resource and type: hard/used)
Number
groundcover_kube_resourcequota_created
Creation timestamp of the ResourceQuota as Unix seconds
Seconds
groundcover_kube_statefulset_metadata_generation
Sequence number representing a specific generation of the StatefulSet
Number
groundcover_kube_statefulset_replicas
Desired number of replicas for the StatefulSet
Number
groundcover_kube_statefulset_status_current_revision
Current revision of the StatefulSet
Number
groundcover_kube_statefulset_status_observed_generation
Most recent generation observed for the StatefulSet
Number
groundcover_kube_statefulset_status_replicas
Number of replicas for the StatefulSet
Number
groundcover_kube_statefulset_status_replicas_available
Number of available replicas for the StatefulSet
Number
groundcover_kube_statefulset_status_replicas_current
Number of current replicas for the StatefulSet
Number
groundcover_kube_statefulset_status_replicas_ready
Number of ready replicas for the StatefulSet
Number
groundcover_kube_statefulset_status_replicas_updated
Number of updated replicas for the StatefulSet
Number
groundcover_kube_statefulset_status_update_revision
Update revision of the StatefulSet
Number
groundcover_kube_job_duration
Time elapsed between the start and completion time of the Job, or current time if the Job is still running
Seconds
groundcover_kube_pod_uptime
Time elapsed since the Pod was created
Seconds
groundcover_container_cpu_request_usage_percent
CPU usage rate out of request (usage/request)
Percentage
groundcover_container_cpu_throttled_percent
Percentage of CPU throttling for the container
Percentage
groundcover_container_cpu_throttled_periods
Total number of throttled CPU periods for the container
Number
groundcover_container_cpu_throttled_rate_millis
Rate of CPU throttling for the container
mCPU
groundcover_container_cpu_throttled_seconds_total
Total CPU throttling time for K8s container
Seconds
groundcover_container_cpu_usage_percent
CPU usage rate (usage/limit)
Percentage
groundcover_container_m_cpu_usage_seconds_total
Total CPU usage time in milli-CPUs for the container
mCPU
groundcover_container_m_cpu_usage_system_seconds_total
Total CPU time spent in system mode for the container
Seconds
groundcover_container_m_cpu_usage_user_seconds_total
Total CPU time spent in user mode for the container
Seconds
groundcover_container_cpu_limit_m_cpu
K8s container CPU limit
mCPU
groundcover_container_cpu_request_m_cpu
K8s container requested CPU allocation
mCPU
groundcover_container_cpu_pressure_full_avg10
Average percentage of time all non-idle tasks were stalled on CPU over 10 seconds
Percentage
groundcover_container_cpu_pressure_full_avg300
Average percentage of time all non-idle tasks were stalled on CPU over 300 seconds
Percentage
groundcover_container_cpu_pressure_full_avg60
Average percentage of time all non-idle tasks were stalled on CPU over 60 seconds
Percentage
groundcover_container_cpu_pressure_full_total
Total time all non-idle tasks were stalled waiting for CPU
Microseconds
groundcover_container_cpu_pressure_some_avg10
Average percentage of time at least some tasks were stalled on CPU over 10 seconds
Percentage
groundcover_container_cpu_pressure_some_avg300
Average percentage of time at least some tasks were stalled on CPU over 300 seconds
Percentage
groundcover_container_cpu_pressure_some_avg60
Average percentage of time at least some tasks were stalled on CPU over 60 seconds
Percentage
groundcover_container_cpu_pressure_some_total
Total time at least some tasks were stalled waiting for CPU
Microseconds
groundcover_container_memory_kernel_usage_bytes
Kernel memory usage for the container
Bytes
groundcover_container_memory_limit_bytes
K8s container memory limit
Bytes
groundcover_container_memory_major_page_faults
Total number of major page faults for the container
Number
groundcover_container_memory_oom_events
Total number of out-of-memory (OOM) events for the container
Number
groundcover_container_memory_page_faults
Total number of page faults for the container
Number
groundcover_container_memory_request_bytes
K8s container requested memory allocation
Bytes
groundcover_container_memory_request_used_percent
Memory usage rate out of request (usage/request)
Percentage
groundcover_container_memory_rss_bytes
Current memory resident set size (RSS)
Bytes
groundcover_container_memory_swap_usage_bytes
Swap memory usage for the container
Bytes
groundcover_container_memory_usage_bytes
Current memory usage for the container
Bytes
groundcover_container_memory_usage_peak_bytes
Peak memory usage for the container
Bytes
groundcover_container_memory_used_percent
Memory usage rate (usage/limit)
Percentage
groundcover_container_memory_pressure_full_avg10
Average percentage of time all non-idle tasks were stalled on memory over 10 seconds
Percentage
groundcover_container_memory_pressure_full_avg300
Average percentage of time all non-idle tasks were stalled on memory over 300 seconds
Percentage
groundcover_container_memory_pressure_full_avg60
Average percentage of time all non-idle tasks were stalled on memory over 60 seconds
Percentage
groundcover_container_memory_pressure_full_total
Total time all non-idle tasks were stalled waiting for memory
Microseconds
groundcover_container_memory_pressure_some_avg10
Average percentage of time at least some tasks were stalled on memory over 10 seconds
Percentage
groundcover_container_memory_pressure_some_avg300
Average percentage of time at least some tasks were stalled on memory over 300 seconds
Percentage
groundcover_container_memory_pressure_some_avg60
Average percentage of time at least some tasks were stalled on memory over 60 seconds
Percentage
groundcover_container_memory_pressure_some_total
Total time at least some tasks were stalled waiting for memory
Microseconds
groundcover_container_io_write_ops_total
Total number of write operations by the container
Number
groundcover_container_disk_delay_seconds
K8s container disk I/O delay
Seconds
groundcover_container_io_pressure_full_avg10
Average percentage of time all non-idle tasks were stalled on I/O over 10 seconds
Percentage
groundcover_container_io_pressure_full_avg300
Average percentage of time all non-idle tasks were stalled on I/O over 300 seconds
Percentage
groundcover_container_io_pressure_full_avg60
Average percentage of time all non-idle tasks were stalled on I/O over 60 seconds
Percentage
groundcover_container_io_pressure_full_total
Total time all non-idle tasks were stalled waiting for I/O
Microseconds
groundcover_container_io_pressure_some_avg10
Average percentage of time at least some tasks were stalled on I/O over 10 seconds
Percentage
groundcover_container_io_pressure_some_avg300
Average percentage of time at least some tasks were stalled on I/O over 300 seconds
Percentage
groundcover_container_io_pressure_some_avg60
Average percentage of time at least some tasks were stalled on I/O over 60 seconds
Percentage
groundcover_container_io_pressure_some_total
Total time at least some tasks were stalled waiting for I/O
Microseconds
groundcover_container_network_tx_bytes_total
Total bytes transmitted by the container
Bytes
groundcover_container_network_tx_dropped_total
Total number of transmitted packets dropped by the container
Number
groundcover_container_network_tx_errors_total
Total number of errors encountered while transmitting packets
Number
groundcover_host_cpu_usage_percent
Percentage of used cpu in the current host
Percentage
groundcover_host_cpu_num_cores
Number of CPU cores on the host
Number
groundcover_host_cpu_user_spent_seconds_total
Total time spent in user mode
Seconds
groundcover_host_cpu_user_spent_percent
Percentage of CPU time spent in user mode
Percentage
groundcover_host_cpu_system_spent_seconds_total
Total time spent in system mode
Seconds
groundcover_host_cpu_system_spent_percent
Percentage of CPU time spent in system mode
Percentage
groundcover_host_cpu_idle_spent_seconds_total
Total time spent idle
Seconds
groundcover_host_cpu_idle_spent_percent
Percentage of CPU time spent idle
Percentage
groundcover_host_cpu_iowait_spent_seconds_total
Total time spent waiting for I/O to complete
Seconds
groundcover_host_cpu_iowait_spent_percent
Percentage of CPU time spent waiting for I/O
Percentage
groundcover_host_cpu_nice_spent_seconds_total
Total time spent on niced processes
Seconds
groundcover_host_cpu_steal_spent_seconds_total
Total time spent in involuntary wait (stolen by hypervisor)
Seconds
groundcover_host_cpu_stolen_spent_percent
Percentage of CPU time stolen by the hypervisor
Percentage
groundcover_host_cpu_irq_spent_seconds_total
Total time spent handling hardware interrupts
Seconds
groundcover_host_cpu_softirq_spent_seconds_total
Total time spent handling software interrupts
Seconds
groundcover_host_cpu_interrupt_spent_percent
Percentage of CPU time spent handling interrupts
Percentage
groundcover_host_cpu_guest_spent_seconds_total
Total time spent running guest processes
Seconds
groundcover_host_cpu_guest_spent_percent
Percentage of CPU time spent running guest processes
Percentage
groundcover_host_cpu_guest_nice_spent_seconds_total
Total time spent running niced guest processes
Seconds
groundcover_host_cpu_context_switches_total
Total number of context switches in the current host
Number
groundcover_host_cpu_load_avg1
CPU load average over 1 minute
Number
groundcover_host_cpu_load_avg5
CPU load average over 5 minutes
Number
groundcover_host_cpu_load_avg15
CPU load average over 15 minutes
Number
groundcover_host_cpu_load_norm1
Normalized CPU load over 1 minute
Number
groundcover_host_cpu_load_norm5
Normalized CPU load over 5 minutes
Number
groundcover_host_cpu_load_norm15
Normalized CPU load over 15 minutes
Number
groundcover_host_mem_free_bytes
Free memory in the current host
Bytes
groundcover_host_mem_available_bytes
Available memory in the current host
Bytes
groundcover_host_mem_cached_bytes
Cached memory in the current host
Bytes
groundcover_host_mem_buffers_bytes
Buffer memory in the current host
Bytes
groundcover_host_mem_shared_bytes
Shared memory in the current host
Bytes
groundcover_host_mem_slab_bytes
Slab memory in the current host
Bytes
groundcover_host_mem_sreclaimable_bytes
Reclaimable slab memory in the current host
Bytes
groundcover_host_mem_page_tables_bytes
Page tables memory in the current host
Bytes
groundcover_host_mem_commit_limit_bytes
Memory commit limit in the current host
Bytes
groundcover_host_mem_committed_as_bytes
Committed address space memory in the current host
Bytes
groundcover_host_mem_swap_cached_bytes
Cached swap memory in the current host
Bytes
groundcover_host_mem_swap_total_bytes
Total swap memory in the current host
Bytes
groundcover_host_mem_swap_free_bytes
Free swap memory in the current host
Bytes
groundcover_host_mem_swap_used_bytes
Used swap memory in the current host
Bytes
groundcover_host_mem_swap_in_bytes_total
Swap in bytes in the current host
Bytes
groundcover_host_mem_swap_out_bytes_total
Swap out bytes in the current host
Bytes
groundcover_host_mem_swap_free_percent
Percentage of free swap memory in the current host
Percentage
groundcover_host_mem_usable_percent
Percentage of usable (available) memory in the current host
Percentage
groundcover_host_disk_space_used_percent
Percentage of used disk space in the current host
Percentage
groundcover_host_disk_read_time_ms_total
Total time spent reading from disk per device in the current host
Milliseconds
groundcover_host_disk_write_time_ms_total
Total time spent writing to disk per device in the current host
Milliseconds
groundcover_host_disk_read_count_total
Total number of disk reads per device in the current host
Number
groundcover_host_disk_write_count_total
Total number of disk writes per device in the current host
Number
groundcover_host_disk_merged_read_count_total
Total number of merged disk reads per device in the current host
Number
groundcover_host_disk_merged_write_count_total
Total number of merged disk writes per device in the current host
Number
groundcover_host_io_write_await_ms
Average time for write requests to be served per device in the current host
Milliseconds
groundcover_host_io_await_ms
Average time for I/O requests to be served per device in the current host
Milliseconds
groundcover_host_io_avg_request_size
Average I/O request size per device in the current host
Kilobytes
groundcover_host_io_service_time_ms
Average service time for I/O requests per device in the current host
Milliseconds
groundcover_host_io_avg_queue_size_kb
Average I/O queue size per device in the current host
Kilobytes
groundcover_host_io_utilization_percent
Percentage of time the device was busy serving I/O requests in the current host
Percentage
groundcover_host_io_block_in_total
Total number of block in the current host
Number
groundcover_host_io_block_out_total
Total number of block out in the current host
Number
groundcover_host_fs_used_percent
Percentage of used filesystem space in the current host
Percentage
groundcover_host_fs_inodes_total
Total inodes in the filesystem
Number
groundcover_host_fs_inodes_used
Used inodes in the filesystem
Number
groundcover_host_fs_inodes_free
Free inodes in the filesystem
Number
groundcover_host_fs_inodes_used_percent
Percentage of used inodes in the filesystem
Percentage
groundcover_host_fs_file_handles_allocated
Total number of file handles allocated in the current host
Number
groundcover_host_fs_file_handles_allocated_unused
Number of allocated but unused file handles in the current host
Number
groundcover_host_fs_file_handles_in_use
Number of file handles currently in use in the current host
Number
groundcover_host_fs_file_handles_max
Maximum number of file handles available in the current host
Number
groundcover_host_fs_file_handles_used_percent
Percentage of file handles in use in the current host
Percentage
groundcover_host_fs_file_handles_used_percent
Percentage of file handles in use in the current host
Percentage
groundcover_host_fs_file_handles_max
Maximum number of file handles available in the current host
Number
groundcover_host_net_transmit_packets_total
Total packets transmitted on network interface
Number
groundcover_host_net_receive_dropped_total
Total number of received packets dropped on network interface
Number
groundcover_host_net_receive_errors_total
Total number of receive errors on network interface
Number
groundcover_host_net_transmit_dropped_total
Total number of transmitted packets dropped on network interface
Number
groundcover_host_net_transmit_errors_total
Total number of transmit errors on network interface
Number
pod_name
K8s pod name
All
container_name
K8s container name
All
container_image
K8s container image name
All
remote_namespace
Remote K8s namespace (other side of the communication)
All
remote_service_name
Remote K8s service name (other side of the communication)
All
remote_container_name
Remote K8s container name (other side of the communication)
All
type
The protocol in use (HTTP, gRPC, Kafka, DNS etc.)
All
sub_type
The sub type of the protocol (GET, POST, etc)
All
role
Role in the communication (client or server)
All
clustered_resource_name
The clustered name of the resource, depends on the protocol
All
status_code
"ok", "error" or "unset"
All
server
The server workload/name
All
client
The client workload/name
All
server_namesapce
The server namespace
All
client_namespace
The client namespace
All
server_is_external
Indicate whether the server is external
All
client_is_external
Indicate wheter the client is external
All
is_encrypted
Indicate whether the communication is encrypted
All
is_cross_az
Indicate wether the communication is cross availability zone
All
clustered_path
HTTP / gRPC aggregated resource path (e.g. /metrics/*)
http, grpc
method
HTTP / gRPC method (e.g GET)
http, grpc
response_status_code
Return status code of a HTTP / gPRC request (e.g. 200 in HTTP)
http, grpc
dialect
SQL dialect (MySQL or PostgreSQL)
mysql, postgresql
response_status
Return status code of a SQL query (e.g 42P01 for undefined table)
mysql, postgresql
client_type
Kafka client type (Fetcher / Producer)
kafka
topic
Kafka topic name
kafka
partition
Kafka partition identifier
kafka
error_code
Kafka return status code
kafka
query_type
type of DNS query (e.g. AAAA)
dns
response_return_code
Return status code of a DNS resolution request (e.g. Name Error)
dns
exit_code
K8s container termination exit code
container_state, container_crash
state
K8s container current state (Running, Waiting or Terminated)
container_state
state_reason
K8s container state transition reason (e.g CrashLoopBackOff or OOMKilled)
container_state
crash_reason
K8s container crash reason (e.g Error, OOMKilled)
container_crash
pvc_name
K8s PVC name
storage
perspective_entity_idperspective_entity_is_externalperspective_entity_issue_idperspective_entity_nameperspective_entity_namespaceperspective_entity_resource_idgroundcover_resource_success_counter
total amount of resource requests with OK status codes
Number
groundcover_resource_latency_seconds
resource latency
Seconds
groundcover_workload_success_counter
total amount of requests handled by the workload with OK status codes
Number
groundcover_workload_latency_seconds
resource latency across all of the workload APIs
Seconds
groundcover_node_allocatable_cpum_cpu
Allocatable CPU in the current node
mCPU
groundcover_node_allocatable_mem_bytes
Allocatable memory in the current node
Bytes
groundcover_node_mem_used_percent
Percentage of used memory in the current node
Percentage
groundcover_pvc_usage_bytes
Persistent Volume Claim (PVC) usage
Bytes
groundcover_pvc_capacity_bytes
Persistent Volume Claim (PVC) capacity
Bytes
groundcover_pvc_available_bytes
Available Persistent Volume Claim (PVC) space
Bytes
groundcover_network_rx_bytes_total
Total bytes received by the workload
Bytes
groundcover_network_tx_bytes_total
Total bytes sent by the workload
Bytes
groundcover_network_connections_opened_total
Total connections opened by the workload
Number
groundcover_kube_cronjob_status_active
Number of active CronJob executions
Number
groundcover_kube_daemonset_status_current_number_scheduled
Number of Pods currently scheduled by the DaemonSet
Number
groundcover_kube_daemonset_status_desired_number_scheduled
Desired number of Pods scheduled by the DaemonSet
Number
groundcover_container_cpu_usage_rate_millis
CPU usage rate
mCPU
groundcover_container_cpu_cfs_periods_total
Total number of elapsed CPU CFS scheduler enforcement periods for the container
Number
groundcover_container_cpu_delay_seconds
K8s container CPU delay
Seconds
groundcover_container_memory_working_set_bytes
Current memory working set
Bytes
groundcover_container_mem_working_set_bytes
Working set memory usage for the container
Bytes
groundcover_container_memory_cache_usage_bytes
Memory cache usage for the container
Bytes
groundcover_container_io_read_bytes_total
Total bytes read by the container
Bytes
groundcover_container_io_read_ops_total
Total number of read operations by the container
Number
groundcover_container_io_write_bytes_total
Total bytes written by the container
Bytes
groundcover_container_network_rx_bytes_total
Total bytes received by the container
Bytes
groundcover_container_network_rx_dropped_total
Total number of received packets dropped by the container
Number
groundcover_container_network_rx_errors_total
Total number of errors encountered while receiving packets
Number
groundcover_container_uptime_seconds
Uptime of the container
Seconds
groundcover_container_crash_count
Total count of container crashes
Number
groundcover_host_uptime_seconds
Uptime of the current host
Seconds
groundcover_host_cpu_capacity_m_cpu
CPU capacity in the current host
mCPU
groundcover_host_cpu_usage_m_cpu
Cpu usage in the current host
mCPU
groundcover_host_mem_capacity_bytes
Memory capacity in the current host
Bytes
groundcover_host_mem_used_bytes
Memory used in the current host
Bytes
groundcover_host_mem_used_percent
Percentage of used memory in the current host
Percentage
groundcover_host_disk_space_used_bytes
Used disk space in the current host
Bytes
groundcover_host_disk_space_free_bytes
Free disk space in the current host
Bytes
groundcover_host_disk_space_total_bytes
Total disk space in the current host
Bytes
groundcover_host_io_read_kb_per_sec
Disk read throughput per device in the current host
Kilobytes per second
groundcover_host_io_write_kb_per_sec
Disk write throughput per device in the current host
Kilobytes per second
groundcover_host_io_read_await_ms
Average time for read requests to be served per device in the current host
Milliseconds
groundcover_host_fs_used_bytes
Used filesystem space in the current host
Bytes
groundcover_host_fs_free_bytes
Free filesystem space in the current host
Bytes
groundcover_host_fs_total_bytes
Total filesystem space in the current host
Bytes
groundcover_host_fs_file_handles_allocated
Total number of file handles allocated in the current host
Number
groundcover_host_fs_file_handles_allocated_unused
Number of allocated but unused file handles in the current host
Number
groundcover_host_fs_file_handles_in_use
Number of file handles currently in use in the current host
Number
groundcover_host_net_receive_bytes_total
Total bytes received on network interface
Bytes
groundcover_host_net_transmit_bytes_total
Total bytes transmitted on network interface
Bytes
groundcover_host_net_receive_packets_total
Total packets received on network interface
Number
clusterId
Name identifier of the K8s cluster
All
region
Cloud provider region name
All
namespace
K8s namespace
All
workload_name
K8s workload (or service) name
groundcover_resource_total_counter
total amount of resource requests
Number
groundcover_resource_error_counter
total amount of requests with error status codes
Number
groundcover_resource_issue_counter
total amount of requests which were flagged as issues
Number
groundcover_workload_total_counter
total amount of requests handled by the workload
Number
groundcover_workload_error_counter
total amount of requests handled by the workload with error status codes
Number
groundcover_workload_issue_counter
total amount of requests handled by the workload which were flagged as issues
Number
groundcover_workload_client_offset
client last message offset (for producer the last offset produced, for consumer the last requested offset), aggregated by workload
groundcover_workload_calc_lagged_messages
current lag in messages, aggregated by workload
Number
groundcover_workload_calc_lag_seconds
current lag in time, aggregated by workload
Seconds
All