Only this pageAll pages
Powered by GitBook
Couldn't generate the PDF for 230 pages, generation stopped at 100.
Extend with 50 more pages.
1 of 100

docs

Welcome

Loading...

Loading...

Capabilities

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Getting Started

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Use groundcover

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Log Management

Stream, store, and query your logs at any scale, for a fixed cost.

Overview

Our Log Management solution is built for high scale and fast query performance so you can analyze logs quickly and effectively from all your cloud environments.

Gain context - Each log data is enriched with actionable context and correlated with relevant metrics and traces in one single view so you can find what you’re looking for and troubleshoot, faster.

Centralize to maximize - The groundcover platform can act as a limitless, centralized log management hub. Your subscription costs are completely unaffected by the amount of logs you choose to store or query. It's entirely up to you to decide.

Collection

Seamless log collection

groundcover ensures a seamless log collection experience with our , which automatically collects and aggregates all logs in all formats - including JSON, plain text, NGINX logs, and more. All this without any configuration needed.

This sensor is deployed as a DaemonSet, running a single pod on each node within your Kubernetes cluster. This configuration enables the groundcover platform to automatically collect logs from all of your pods, across all namespaces in your cluster. This means that once you've installed groundcover, no further action is needed on your part for log collection. The logs collected by each sensor instance are then channeled to the OTel Collector.

OTel Collector: A vendor-agnostic way to receive, process and export telemetry data.

Acting as the central processing hub, the OTel Collector is a vendor-agnostic tool that receives logs from various sensor pods. It processes, enriches, and forwards the data into groundcover's ClickHouse database, where all log data from your cluster is .

Logs Attributes

Logs Attributes enable advanced filtering capabilities and is currently supported for the formats:

  • JSON

  • Common Log Format (CLF) - like those from NGNIX and Kong

  • logfmt

groundcover automatically detects the format of these logs, extracting key:value pairs from the original log records as Attributes.

Each attribute can be added to your filters and search queries.

Example: filtering a log in a supported format with a field of a request path "/status" will look as follows: @request.path:"/status". Syntax can be found .

Configuration

groundcover offers the flexibility to craft tailored collection filtering rules, you can choose to set up filters and collect only the logs that are essential for your analysis, avoiding unnecessary data noise. For guidance on configuring your filters, explore our section.

You also have the option to for your logs in the ClickHouse database. By default, logs are retained for 3 days. To adjust this period to your preferences, visit our section for instructions.

Log Explorer

Once logs are collected and ingested, they are available within the groundcover platform in the Log Explorer, which is designed for quick searches and seamless exploration of your logs data. Using the Log Explorer you can troubleshoot and explore your logs with advanced search capabilities and filters, all within a clear and fast interface.

Search and filter

The Log Explorer integrates dynamic filters and a versatile search functionality that enables you to quickly and easily identify the right data. You can filter out logs by selecting one or multiple criteria, including log-level, workload, namespace and more, and can limit your search to a specific time range.

Log Pipelines

groundcover natively supports setting up log pipelines using This allow for full flexibility in the processing and manipulation of logs being collected - parsing additional patterns by regex, renaming attributes, and many more.

FAQ

How much does groundcover cost?

groundcover's unique pricing model is the first to decouple data volumes from cost of owning and operating the solution. For example, subscribing to our costs $30 per node/host per month.

Overall, the cost of owning and operating groundcover is based on two factors:

  • The number of nodes (hosts) you are running in the environment you are monitoring

Introduction

groundcover is a full stack, cloud-native observability platform, developed to break all industry paradigms - from making instrumentation a thing of the past, to decoupling cost from data volumes

The platform consolidates all your traces, metrics, logs, and Kubernetes events into a single pane of glass, allowing you to identify issues faster than ever before and conduct granular investigations for quick remediation and long-term prevention.

Our is not impacted by the volume of data generated by the environments you monitor, so you can dare to start monitoring environments that had been blind spots until now - such as your Dev and Staging clusters. This, in turn, provides you visibility into all your environments, making it much more likely to identify issues in the early stages of development, rather than in your live product.

groundcover introduces game-changing concepts to observability:

The costs of hosting groundcover's backend in your environment

Check out our TCO calculator to simulate your total cost of ownership for groundcover.

Can I use groundcover across multiple clusters?

Definitely. As you deploy groundcover each cluster is automatically assigned the unique name it holds inside your cloud environment. You can browse and select all your clusters at one place with our UI experience.

What K8s flavors are supported?

groundcover has been tested and validated on the most common K8s distributions. See full list in the Requirements section.

What protocols are supported?

groundcover supports the most common protocols in most K8s production environments out-of-the-box. See full list here.

What types of data does groundcover collect?

groundcover's kernel-level eBPF sensor automatically collects your logs, application metrics (such as latency, throughput, error rate and much more), infrastructure metrics (such as deployment updates, container crashes etc.), traces, and Kubernetes events. You can control which data is left out of the automatic collection using data obfuscation.

Where is my data being stored?

groundcover stores all the data it collects inside your environment, using the state-of-the-art storage services of ClickHouse and Victoria Metrics, with the option to offload data to object storage such as S3 for long-term retention. See our Architecture section for more details.

Is my data secure?

groundcover stores the data it collects in-cluster, inside your environment without ever leaving the cluster to be stored anywhere else.

Our SaaS UI experience stores only information related to the account, user access and general K8s metadata used for governance (like the number of nodes per cluster, the name given to the cluster etc.).

All the information served to the UI experience is encrypted all the way to the in-cluster data sources. groundcover has no access to your collected data, which is accessible only to an authenticated user from your organization. groundcover does collect telemetry information (opt-out is of course possible) which includes metrics about the performance of the deployment (e.g. resource consumption metrics) and logs reported from the groundcover components running in the cluster.

All telemetry information is anonymized, and contains no data related to your environment.

Regardless, groundcover is SOC2 and ISO 27001 compliant and follows best practices.

How can I invite my team to my workspace?

If you used your business email to create your groundcover account, you can invite your team to your workspace by clicking on the purple "Invite" button on the upper menu. This will open a pop-up where you can enter the emails of the people you want to invite. You also have an option to copy and share your private link.

Note: The Admin of the account (i.e. the person that created it) can also invite users outside of your email domain. Non-admin users can only invite users that share the same email domain. If you used a private email, you can only share the link to your workspace by clicking the "Share" button on the top bar.

Read more about invites in our quick start guide.

Is groundcover open source?

groundcover's CLI tool is currently Open Source along side more projects like Murre and Caretta. We're working on releasing more parts of our solution to Open Source very soon. Stay tuned in our GitHub page!

What operating system (OS) do I need to use groundcover?

groundcover’s sensor uses eBPF, which means it can only deployed on a Kubernetes cluster that is running on a Linux system.

Installing using the CLI command is currently only supported on Linux and Mac.

You can install using the Helm command from any operating system.

Once installed, accessing the groundcover platform is possible from any web browser, on any operating system.

Enterprise plan
proprietary eBPF sensor
securely stored
here
Customize Logs Collection
define the retention period
Customize Retention
Learn more about how to use our search syntaxes
Vector transforms.
Learn more about how to configure log pipelines
eBPF sensor

eBPF (extended Berkeley Packet Filter) is a groundbreaking technology that has significantly impacted the Linux kernel, offering a new way to safely and efficiently extend its capabilities.

By powering our sensor with eBPF, groundcover unlocks unprecedented granularity on your cloud environment, while also practically eliminating the need for human involvement in the installation and deployment process. Our unique sensor collects data directly from the Linux kernel with near-zero impact on CPU and memory.

Advantages of our eBPF sensor:

  • Zero instrumentation: groundcover's eBPF sensor gathers granular observability data without the need for integrating an SDK or changing your applications' code in any way. This enables all your logs, metrics, traces, and other observability data to flow automatically into the platform. In minutes, you gain full visibility into application and infrastructure health, performance, resource usage, and more.

  • Minimal resources footprint: groundcover’s sensor in installed on a dedicated node in each monitored cluster, operating separately from the applications it is monitoring. Without interference with the application's primary functions, the groundcover platform operates with near-zero impact on your resources, maintaining the applications' performance and avoiding unexpected overhead on the infrastructure.

  • A new level of insight granularity: With direct access to the Linux kernel, our eBPF sensor enables the collection of data straight from the source. This guarantees that the data is clean, unaltered, and precise. It also offers access to unique insight on your application and infrastructure, such as the ability to view the full traces of payloads, or analyzing network performance over time.

Bring-your-own-cloud (BYOC) architecture

The one-of-a-kind architecture on which groundcover was built eliminates all requirements to stream your logs, metrics, traces, and other monitoring data outside of your environment and into a third-party's cloud. By leveraging integrations with best-of-breed technologies, including ClickHouse and Victoria Metrics, all your observability is stored data locally, with the ability of being fully managed by groundcover.

Advantages of our BYOC architecture:

  • By separating the data plane from the control plane, you get the advantages of a SaaS solution, without its security and privacy challenges.

  • With multiple deployment models available, you also get to choose the level of security and privacy your organization needs, up to the highest standards (FedRamp-level).

  • Automated deployment, maintenance & resource optimization with our inCloud Managed deployment option.

This concept is unique to groundcover, and takes a while to grasp. Read about our BYOC architecture more in detail in this dedicated section.

Learn about groundcover inCloud Managed (currently available only on a paid plan), which enables you to deploy groundcover's control plane inside your own environment and delegate the entire setup and management of the groundcover platform.

Disruptive pricing model

Enabled by our unique BYOC architecture, groundcover's vision is to revolutionize the industry by offering a pricing model that is unheard of anywhere else. Our fully transparent pricing model is based only on the number of nodes being monitored, and the costs of hosting the groundcover backend in your environment. Volume of logs, metrics, traces, and all other observability data don’t affect your cost. This results in savings of 60-90% compared to SaaS platforms.

In addition, all our subscription tiers never limit your access to features and capabilities.

Advantages of our nodes-based pricing model:

  • Cost is predictable and transparent, becoming an enabler of growth and expansion.

  • The ability to deploy groundcover in data-intensive environments enables the monitoring of Dev and Staging clusters, which promotes early identification of issues.

  • No cardinality or retention limits

Read our latest customer stories to learn how organization of varying sizes all reduce their observability costs dramatically by migrating to groundcover:

Stream processing

groundcover applies a stream processing approach to collect and control the continuous flow of data to gain immediate insights, detect anomalies, and respond to changing conditions. Unlike batch processing, where data is collected over a period and then analyzed, stream processing analyzes the data as it flows through the system.

Our platform uses a distributed stream processing engine that enables it to ingest huge amounts of data (such as logs, traces and Kubernetes events) in real time. It also processes all that data and instantly generates complex insights (such as metrics and context) based on it.

As a result, the volume of raw data stored dramatically decreases which, in turn, further reduces the overall cost of observability.

Capabilities

Log Management

Designed for high scalability and rapid query performance, enabling quick and efficient log analysis from all your environments. Each log is enriched with actionable context and correlated with relevant metrics and traces, providing a comprehensive view for fast troubleshooting.

Learn more →

Infrastructure Monitoring

The groundcover platform provides cloud-native infrastructure monitoring, enabling automatic collection and real-time monitoring of infrastructure health and efficiency.

Learn more →

Application Performance Monitoring (APM)

Gain end-to-end observability into your applications performance, identify and resolve issues instantly, all with zero code changes.

Learn more →

Real User Monitoring (RUM)

Real User Monitoring (RUM) extends groundcover’s observability platform to the client side, providing visibility into actual user interactions and front-end performance. It tracks key aspects of your web application as experienced by real users, then correlates them with backend metrics, logs, and traces for a full-stack view of your system.

Learn more →

groundcover
pricing

Application Performance Monitoring (APM)

Gain end-to-end observability into your applications performance, identify and resolve issues instantly - all with zero code changes.

Overview

The groundcover platform collects data all across your stack using the power of eBPF instrumentation. Our proprietary eBPF sensor is installed
in seconds and provides 100% coverage into application metrics and traces with zero code changes or configurations.

Resolve faster - By seamlessly correlating traces with application metrics, logs, and infrastructure events, groundcover’s APM enables you to detect and resolve root issues faster.

Improve user experience - Optimize your application performance and resource utilization faster than ever before, avoid downtimes and make poor end-user experience a thing of the past.

Collection

Our revolutionary eBPF sensor, , is deployed as a DaemonSet in your Kubernetes cluster. This approach allows us to inspect every packet that each service is sending or receiving, achieving 100% coverage. No sampling rates or relying on statistical luck - all requests and responses are observed.

This approach would not be feasible without a resource-efficient eBPF-powered sensor. eBPF not only extends the ability to pinpoint issues - it does so with much less overhead than any other method. eBPF can be used to analyze traffic originating from every programming language and SDK - even for encrypted connections!

Click for a full list of supported technologies

Reconstruction

After being collected by our eBPF code, the traffic is then classified according to its protocol - which is identified directly from the underlying traffic, or the library from which it originated. Connections are reconstructed, and we can generate transactions - HTTP requests and responses, SQL queries and responses etc.

Enrichment

In order to give as much context as possible each transaction is enriched with as much metadata as possible. Some examples might include the pods that took part in this transaction (both client and server), the nodes on which these pods are scheduled, and the state of container at the time of the request.

It is important to emphasize the impressive granularity level with which this process takes place - every single transaction observed is fully enriched. This allows us to perform more advanced aggregations.

Aggregation

After being enriched by as much context as possible, the transactions as grouped together into meaningful aggregations. These could be defined by the workloads involved, the protocols detected and the resources that were accessed in the operations. These aggregations will mostly come into play when displaying .

Exporting

After collecting the data, contextualizing it and putting it together in meaningful aggregations - we can now create and to provide meaningful insights into the services' behaviors.

Metrics

Learn how groundcover's application metrics work:

Traces

Learn how groundcover's application traces work:

Flora
here
golden signals
metrics
traces
Application Metrics
Traces

Nobl9 expands monitoring to cover production e2e, including testing and staging environments

Replacing Datadog with groundcover cut Nobl9’s observability costs in half while improving log coverage, providing deeper granularity on traces with eBPF, and enabling operational growth and scalability.

Tracr eliminates blind spots with native-K8s observability and eBPF tracing

Tracr migrates from a fragmented observability stack to groundcover, gaining deep Kubernetes visibility, automated eBPF tracing, and a cost-effective monitoring solution. This transition streamlined troubleshooting, expanded observability across teams, and enhanced the reliability of their blockchain infrastructure.

eBPF sensor
BYOC architecture
Disruptive pricing
Cover
Cover

CPU architectures

The following architectures are fully supported for all groundcover workloads:

  • x86

  • ARM

Real-world Use Cases

These are patterns we've seen in the wild. Agents use groundcover to debug, monitor, and close the loop.

Test → Logs → Fix

Cursor generates tests, tags each with a test_id, logs them, and then uses groundcover to instantly fetch related log lines.

Investigate Issues via Cursor

Got a monitor firing? Drop the alert into Cursor. The agent runs a quick RCA, queries groundcover, and even suggests a patch based on recent logs and traces.

Support Workflow

Support rep gets an error ID → uses MCP to query groundcover → jumps straight to the root cause by exploring traces and logs around the error.

The Autonomous Loop

An agent picks up a ticket, writes tests, ships code to staging, monitors it with groundcover, checks logs and traces, and verifies the fix end to end. Yes, really. Full loop. Almost no hands.

Embedded Grafana Dashboards

groundcover enables access to an embedded Grafana, within the groundcover platform's interface. This enables you to easily import and continue using your existing Grafana dashboards and alerts.

The following guides will help you setup and import your visualizations from Grafana:

Create a Grafana dashboard

Build alerts & dashboards with Grafana Terraform provider

Using groundcover as Prometheus/Clickhouse database in a Self-hosted Grafana

Drilldown

The Drilldown view helps you to quickly identify and highlight the most informative attributes - those that stand out and help you pinpoint anomalies or bottlenecks.

Distribution Mode

In this mode, groundcover showcases the top attributes found in your traces or logs data. Each attribute displays up to four values with the highest occurrence across the selected traces.

You can click any value to add or exclude it as a filter and continue drilling down interactively.

How attributes are selected

We use statistical scoring based on:

  • Entropy: how diverse the values of an attribute are.

  • Presence ratio: how often the attribute appears across the selected traces.

Attributes that are both common and have high entropy are prioritized.

Requirements

To ensure a seamless experience with groundcover, it's important to confirm that your environment meets the necessary requirements. Please review the detailed requirements for Kubernetes, our eBPF sensor, and the necessary hardware and resources to guarantee optimal performance.

Kubernetes requirements

groundcover supports a wide range of Kubernetes versions and distributions, including popular platforms like EKS, AKS, and GKE.

Learn more ->

Kernel requirements for eBPF sensor

Our state-of-the-art eBPF sensor leverages advanced kernel features to deliver comprehensive monitoring with minimal overhead, requiring specific Linux kernel versions, permissions, and CO:RE support.

Hardware and resource requirements

groundcover fully supports both x86 and ARM processors, ensuring compatibility across diverse environments.

ClickHouse resources

groundcover operates ClickHouse to support many of its core features. This requires suitable resources given to the deployment, which groundcover takes care of according to your data usage.

API Examples

Welcome to the API examples section. Here, you’ll find practical demonstrations of how to interact with our API endpoints using cURL commands. Each example is designed to help you quickly understand h

Structure of the Examples

  • cURL-based examples: Every example shows the exact cURL command you can copy and run directly in your terminal.

  • Endpoint-specific demonstrations: We walk through different API endpoints one by one, highlighting the required parameters and common use cases.

  • Request & Response clarity: Each section contains both the request (what you send) and the response (what you get back) to illustrate expected behavior.

Prerequisites

Before running any of the examples, make sure you have:

Infrastructure Monitoring

Get complete visibility into your cloud infrastructure performance at any scale, easily access all your metrics in one place and optimize infrastructure efficiency.

Overview

The groundcover platform offers infrastructure monitoring capabilities that were built for cloud-native environments. It enables you to track the
health and efficiency of your infrastructure instantly, with an effortless deployment process.

Troubleshoot efficiently - acting as a centralized hub for all your infrastructure, application and customer metrics allows you to query, correlate and troubleshoot your cloud environments using real time data and insight on your entire stack.

Store it all, without a sweat - store any metrics volume without worrying about cardinality or retention limits.

Monitor List page

View, filter, and manage all monitors in one place, and quickly identify issues or create new monitors.

The Monitor List is the central hub for managing and monitoring all active and configured Monitors. It provides a clear, filterable table view of your Monitors, with their current status and key details, such as creation date, severity, and live issues. Use this page to review your Monitors performance, identify issues, and take appropriate action.

Key Features

5 quick steps to get you started

Once installed, we recommend following these steps to help you quickly gain the most our of groundcover's unique observability platform.

1. Get a complete view of your workloads

The "Home page" of the groundcover app is our Workloads page. From here, you can get a service-centric view,

Create a Grafana dashboard

The following guide explains how to build dashboards within the groundcover platform using our fully integrated Grafana interface. To learn how you can create dashboards using Grafana Terraform, follow .

A dashboard is a great tool for visually tracking, analyzing, and displaying key performance metrics, which enable you to monitor the health of your infrastructure and applications.

groundcover MCP

Supercharge your AI agents with o11y superpowers using the groundcover MCP server. Bring logs, traces, metrics, events, K8s resources, and more directly into your agent’s context — and troubleshoot side by side with your AI assistant to solve issues in minutes.

Status: Work in progress. We keep adding tools and polishing the experience. Got an idea or question? Ping us on Slack!

What is MCP?

MCP (Model Context Protocol) is an open standard that enables AI tools to access external data sources like APIs, observability platforms, and documentation directly within their working context. MCP servers expose these resources in a consistent way, so the AI agent can query and use them as needed to streamline workflows.

Migrations

Automated migration from legacy vendors. Bring over your monitors, dashboards, data sources, and all the data you need - with automatic mapping and zero downtime.

Overview

groundcover is the first observability platform to ship a one-click migration tool from legacy vendors. The migration flow automatically discovers, translates, and installs your observability setup into groundcover.

Goal: Move your entire observability stack with zero manual work. We don't just migrate assets - we bring the data and handle all the mapping for you.

Insights

Quickly understand your data with groundcover

groundcover insights give you a clear snapshot of notable events in your data. Currently, the platform supports Error Anomalies, with more insight types on the way.

Error Anomalies

Error Anomalies instantly highlight workloads, containers, or environments experiencing unusual spikes in Error or Critical logs, as well as Traces marked with an error status. These anomalies are detected using statistical algorithms, continuously refined through user feedback for accuracy.

Each insight surfaces trends based on the entity’s error signals (e.g., workload, container, etc.):

Full Webhook Examples

This section contains comprehensive examples of webhook integrations with various third-party services. These examples provide step-by-step instructions for setting up complete workflows with external systems.

Available Examples

  • - Integrate with incident.io for incident management

Embedded Grafana Alerts

groundcover enables access to an embedded Grafana, within the groundcover platform's interface. This enables you to easily import and continue using your existing Grafana and alerts.

Raw Prometheus and Clickhouse API

Use raw prometheus api calls

Raw Prometheus API

You can obtain your API key from the in the groundcover console.

You can query the Prometheus API directly using curl:

For complete Prometheus REST API documentation, see:

Monitor Catalog page

Explore and select pre-built Monitors from the catalog to quickly set up observability for your environment. Customize and deploy Monitors in just a few clicks.

Overview

The Monitor Catalog is a library of pre-built templates to efficiently create new Monitors. Browse and select one or more Monitors to quickly configure their environment with a single click. The Catalog groups monitors into "Packs", based on different use cases.

Add custom environment labels

Enhance your ability to easily monitor a group of clusters

This capability is available only to Enterprise users. Learn more about our .

Labeling clusters in your cloud-native environments is very helpful for efficient resource management and observability. By assigning an environment label in groundcover, you can categorize and identify clusters based on any specific criteria that you find helpful. For example, you can choose to label your clusters by environment type (development, staging, production, etc.), or by region (EU, US, etc.).

To add an environment label to your cluster, edit your cluster's existing values.yaml and add the following line:

Once defined and added, these labels will be available for you to select in the cluster and environment drop down menu ("Cluster Picker").

Service Accounts

A service account is a non-human identity for API access, governed by RBAC and supporting multiple API keys.

Summary

Service accounts in groundcover are non-human identities used for programmatic access to the API. They’re ideal for CI pipelines, automation, and backend services, and are governed by groundcover’s RBAC system.

groundcover’s MCP server brings your live observability data into the picture - making your agents smarter, faster, and more accurate.

How Can groundcover's MCP Help You and Your Agent?

By connecting your agent to groundcover’s MCP server, you enable it to:

  • Query live logs, traces, and metrics for a workload, container, or issue.

  • Run root cause analysis (RCA) on issues, right inside your IDE or chat interface.

  • Auto-debug code with observability context built in.

  • Monitor deployed code and validate fixes without switching tools.

See examples in our Getting-started Prompts and Real-world Use Cases.

Install groundcover's MCP Server

Set up is quick and agent-friendly. We support both OAuth (recommended) and API key flows.

Head to Configure groundcover's MCP Server for setup instructions and client-specific guides.

What we migrate

Monitors

Includes alert conditions, thresholds, and evaluation windows.

Dashboards

Complete dashboard migration with preserved layouts:

  • All widget types groundcover supports

  • Query translations

  • Time ranges and filters

  • Visual settings and arrangements

Data Sources

We detect what you're using in Datadog and help you set it up in groundcover. One-click migration for integrations is coming soon.

Data & Mapping

We don't just copy configurations - we ensure the data flows:

  • Automatic metric mapping: Datadog metric names translated to groundcover equivalents

  • Label translation: Tags become labels with intelligent mapping

  • Query conversion: Datadog query syntax converted to groundcover

  • Data validation: We verify all referenced metrics and data sources exist

Supported providers

Datadog

Full migration support available now.

Migrate from Datadog →

Other providers

Additional vendors coming soon.

How it works

Three steps. No scripts. No downtime.

  1. Fetch & discover: Provide API keys. groundcover pulls your monitors, dashboards, and data sources.

  2. Automatic setup: We install data sources, map metrics, and prepare everything.

  3. Migrate assets: Review and migrate monitors and dashboards with one click.

API keys are not stored.

Access

Settings → Migrations (Admin role required)

What's next

The migration flow is structured to support additional asset types:

  • Data source configurations (available now)

  • Log pipelines (coming soon)

  • Advanced metric mappings (coming soon)

Identity and Permissions

A service account has a name and email, but it cannot be used to log into the UI or via SSO. Instead, it functions purely for API access. Each account must have at least one RBAC policy assigned, which defines its permission level (Admin, Editor, Viewer) and data scope. Multiple policies can be attached to broaden access; effective permissions are the union of all policies.

Creation and Management

Only Admins can create, update, or delete service accounts. This can be done via the UI (Settings → Access → Service Accounts) or API. During creation, Admins define the name, email, and initial policies. You can edit service account, changing email address and assigned policies, but can't rename.

API Key Association

A service account can have multiple API keys. This makes it easy to rotate credentials or issue distinct keys for different use cases. All keys are tied to the same account and carry its permissions. Any action taken using a key is logged as performed by the associated service account.

Learn more ->
Learn more ->
API Key
Backend ID
MS Teams - Send notifications to Microsoft Teams channels
  • Email via Zapier - Route alerts to email using Zapier

  • Slack App with Bot Tokens - Route alerts to different slack channels with a single Webhook

  • Each example includes:

    • Prerequisites and setup requirements

    • Step-by-step configuration instructions

    • Complete workflow YAML configurations

    • Integration-specific considerations and best practices

    These examples demonstrate advanced webhook usage patterns and can serve as templates for other webhook integrations.

    incident.io
    dashboards
    On Logs, anomalies are based on logs filtered by level:error or level:critical, and grouped by:
    • workload

    • container

    • namespace

    • environment

    • cluster

    On Traces, anomalies are based on traces filtered by status:error, and grouped by a more granular set of dimensions:

    • protocol_type

    • return_code

    • role (client/server)

    • workload

    • container

    • namespace

    • environment

    • cluster

    Logs Insights
    Key Features

    Batch Monitor Creation

    You can select as many monitors as you wish, and add them all in one click. Select a complete pack or multiple Monitors from different packs, then click "Create Monitor". All Monitors will be automatically created. You can always edit them later.

    Single Monitor Creation

    You can also create a single Monitor from the Catalog. When hovering over a Monitor, a "Wizard" button will appear. Clicking on it will direct you to the Monitor Creation Wizard where you can review and edit before creation.

    Monitors Catalog
    env: "my-env-name"
    paid plans
    Legacy API

    The following configurations are deprecated but may still be in use in older setups.

    The legacy datasources API key can be obtained by running: groundcover auth get-datasources-api-key

    You can query the legacy Prometheus API directly using curl:

    # Query current metrics
    curl -H "Authorization: Bearer <YOUR_API_KEY>" \
      "https://app.groundcover.com/api/prometheus/api/v1/query?query=up"
    API Keys page
    Prometheus HTTP API
    # Query current logs (legacy endpoint)
    curl https://ds.groundcover.com/ -H "X-ClickHouse-Key: DS-API-KEY-VALUE" --data "SELECT count() FROM groundcover.logs LIMIT 1 FORMAT JSON" 
    remain unaffected by the granularity of metrics you store or query.

    Collection

    groundcover's proprietary eBPF sensor leverages all its innovative powers to collect comprehensive data across your cloud environments without the burden of performance overhead. This data is sourced from various Kubernetes components, including kube-system workloads, cluster information via the Kubernetes API, and the applications' interactions with the Kubernetes infrastructure. This level of detailed collection at the kernel level enables the ability to provide actionable insights into the health of your Kubernetes clusters, which are indispensable for troubleshooting existing issues and taking proactive steps to future-proof your cloud environments.

    Configuration

    You also have the option to define the retention period for your metrics in the VictoriaMetrics database. By default, logs are retained for 7 days, but you can adjust this period to your preferences.

    Enrichment

    Beyond collecting data, groundcover's methodology involves a strategic layer of data enrichment that seeks to correlate Kubernetes metrics with application performance indicators. This correlation is crucial for creating a transparent image of the Kubernetes ecosystem. It enables a deep understanding of how Kubernetes interacts with applications, identifying potential points of failure across the interconnected environment. By monitoring Kubernetes not as an isolated platform but as an integral part of the application infrastructure, groundcover ensures that the monitoring strategy aligns with your dynamic and complex cloud operations.

    Infrastructure Metrics

    Monitoring a cluster involves tracking resources that are critical to the performance and stability of the entire system. Monitoring these essential metrics is crucial for maintaining a healthy Kubernetes cluster:

    • CPU consumption: It's essential to track the CPU resources being utilized against the total capacity to prevent workloads from failing due to insufficient CPU availability.

    • Memory utilization: Keeping an eye on the remaining memory resources ensures that your cluster doesn't encounter disruptions due to memory shortages.

    • Disk space allocation: For Kubernetes clusters running stateful applications or requiring persistent storage for data, such as etcd databases, tracking the available disk space is crucial to avert potential storage deficiencies.

    • Network usage: Visualize traffic rates and connections being established and closed on a service-to-service level of granularity, and easily pinpoint cross availability zone communication to investigate misconfigurations and surging costs.

    Container CPU and Memory

    Available Labels

    type

    clusterId region namespace node_name workload_name

    pod_name container_name container_image

    Available Metrics

    Name
    Description
    Type

    groundcover_container_cpu_usage_rate_millis

    CPU usage in mCPU

    Gauge

    groundcover_container_cpu_request_m_cpu

    K8s container CPU request (mCPU)

    Gauge

    groundcover_container_cpu_limit_m_cpu

    K8s container CPU limit (mCPU)

    Gauge

    groundcover_container_memory_working_set_bytes

    current memory working set (B)

    Node CPU, Memory and Disk

    Available Labels

    type clusterId region node_name

    Available Metrics

    Name
    Description
    Type

    groundcover_node_allocatable_cpum_cpu

    amount of allocatable CPU in the current node (mCPU)

    Gauge

    groundcover_node_allocatable_mem_bytes

    amount of allocatable memory in the current node (B)

    Gauge

    groundcover_node_mem_used_percent

    percent of used memory in current node (0-100)

    Gauge

    groundcover_node_used_disk_space

    current used disk space in current node (B)

    PVC Usage

    Available Labels

    type clusterId region name namespace

    Available Metrics

    Name
    Description
    Type

    groundcover_pvc_usage_bytes

    PVC used bytes (B)

    Gauge

    groundcover_pvc_capacity_bytes

    PVC capacity bytes (B)

    Gauge

    groundcover_pvc_available_bytes

    PVC available bytes (B)

    Gauge

    groundcover_pvc_usage_percent

    percent of used pvc storage (0-100)

    Network Usage

    Available Labels

    clusterId workload_name namespace container_name remote_service_name remote_namespace remote_is_external availability_zone region remote_availability_zone remote_region is_cross_az protocol role server_port encryption transport_protocol is_loopback

    Notes:

    • is_loopback and remote_is_external are special labels that indicate the remote service is either the same service as the recording side (loopback) or resides in an external network, e.g managed service outside of the cluster (external).

      • In both cases the remote_service_name and the remote_namespace labels will be empty

    • is_cross_az means the traffic was sent and/or received between two different availability zones. This is a helpful flag to quickly identify this special kind of communication.

      • The actual zones are detailed in the availability_zone and remote_availability_zone labels

    Available Metrics

    Name
    Description
    Type

    groundcover_network_rx_bytes_total

    Bytes received by the workload (B)

    Counter

    groundcover_network_tx_bytes_total

    Bytes sent by the workload (B)

    Counter

    groundcover_network_connections_opened_total

    Connections opened by the workload

    Counter

    groundcover_network_connections_closed_total

    Connections closed by the workload

    Counter
    Your subscription costs
    Monitors Table
    • Displays the following columns:

      • Name: Title of the monitor.

      • Creation Date: When the monitor was created.

      • Live Issues: Number of live issues currently firing.

      • Status: Is the Monitor "Firing" (alerts active) or "Normal" (no alerts).

    • Tip: Click on a Monitor name to view its detailed configuration and performance metrics.

    Monitors Table

    Create Monitor

    You can create a new Monitor by clicking on Create Monitor, then choosing between the different options: Monitor Wizard, Monitor Catalog, or Import. For further guidance, check out our guide.

    Filters Panel

    Use filters to narrow down monitors by:

    • Severity: S1, S2, S3, or custom severity levels.

    • Status: Alerting or Normal.

    • Silenced: Exclude silenced monitors.

    Tip: Toggle multiple filters to refine your view.

    Search Bar

    Quickly locate monitors by typing a name, status, category, or other keywords.

    Cluster and Environment Filters

    Located at the top-right corner, use these to focus on monitors for specific clusters or environments.

    2. Check out full payloads of traces

    A highly impactful advantage of leveraging eBPF in our proprietary sensor is that it enables visibility on the full payloads of both request and response - including headers! This allows you to very quickly understand issues and provides context.

    Go to Traces →

    3. Build a native dashboard

    groundcover enables to very easily build custom dashboards to visualize your data using our intuitive Query Builder as a guide, or using your own queries.

    Go to Dashboards →

    4. Set up a Monitor

    Define custom alerts using our native Monitors, which you can configure using groundcover data and custom metrics. You can also choose from our Monitors Catalog, which contains multiple pre-built Monitors that cover a few of the most common use cases and needs.

    Go to Monitors →

    5. Invite your team

    Invites lets you share your workspaces with your colleagues in just a couple of clicks. You can find the "Invite Members" option at the bottom of the left navigation bar. Type in the email addresses of the teammates you want to invite, and set their user permissions (Admin, Editor, Read Only), then click "Send Invites".

    Go to Workloads →
    Creating a new dashboard

    1️⃣ Go to the Dashboards tab in the groundcover app, and click New and then New Dashboard.

    2️⃣ Create your first panel by clicking Add a new panel.

    3️⃣ In the New panel view, go to the Query tab.

    4️⃣ Choose your data source by pressing the -- Grafana -- on the data source selector. You would see your the metrics collected from each of your clusters an a Prometheus data source called Prometheus@<cluster-name>

    5️⃣ Create your first Query in the PromQL query interface.

    Learn more about Grafana panels and PromQL queries to improve your skills. For any help in creating your custom dashboard don't hesitate to join our Slack support channel.

    Tips: \

    • Learn more about the supported metrics you can use to build dashboards in the Infrastructure Metrics section under Infrastructure Monitoring and Application Metrics page.\

    • groundcover has a set of example dashboards in the Dashboards by groundcover folder which can get you started. These dashboard are read-only but you can see the PromQL query behind each panel by right-clicking the panel and then Explore \

    this guide
    Dashboards and Alerts inside the groundcover UI

    Delete Workflow

    Delete an existing workflow using workflow id

    Endpoint

    DELETE /api/workflows/{id}

    Authentication

    This endpoint requires API key authentication.

    Headers

    Header
    Value
    Description

    Path Parameters

    Parameter
    Type
    Description

    Example Request

    Response

    Success Response

    Status Code: 200 OK

    Notes

    • Once a workflow is deleted, it cannot be recovered

    • The deletion is immediate and permanent

    • All associated workflow executions and history are also removed

    • The API returns HTTP 200 status code for both successful deletions and "not found" cases

    Build alerts & dashboards with Grafana Terraform provider

    Configure the Grafana Terraform provider

    For instructions on how to generate a Grafana Service Account Token and use it in the Grafana Terraform provider, see: Grafana Service Account Token.

    Dashboard provisioning example

    • Create a directory for the terraform assets

    • Create a main.tf file within the directory that contains the terraform provider configuration mentioned in step 2

    • Create the following dashboards.tf file, this example declares a new Golden Signals folder, and within it a Workload Golden Signals

    • add the workloadgoldensignals.json file to the directory as well

    • Run terraform init to initialize terraform context

    • Run terraform plan , you should see a long output describing the assets that are going to be created last line should state Plan: 2 to add, 0 to change, 0 to destroy.

    • Run terraform apply

    Here is a short video to demonstrate the process

    You can read more about what you can achieve with the Grafana Terraform provider in the

    Silences page

    Manage and create silences to suppress Monitor notifications during maintenance or specific periods, reducing noise and focusing on critical issues.

    Overview

    The Silences page lists all Silences you and your team created for your Monitors. In this section, you can also create and manage your Silence rules, to suppress notifications and Issues noise, for a specified period of time. Silences are a great way to reduce alert fatigue, which can lead to missing important issues, and help focus on the most critical issues during specific operational scenarios such as scheduled maintenances.

    Create a new Silence

    Follow these simple steps to create a new Silence.

    Section 1: Schedule

    Specify the timeframe for the silence rule. Note that the starting point doesn't have to be now, and can also be any time in the future.

    Below the From / Until boxes, you'll see a Silence summary, showing its approximate length (rounded down to full days) and starting date.

    Section 2: Matchers

    Define the criteria for Monitors or Issues to be silenced.

    1. Click Add Matcher to specify match conditions (e.g., cluster, namespace, span_name).

    2. Combine multiple matchers for more granular control.

    Example: Silence all Monitors in the "demo" namespace.

    Section 3 - Affected Active Issues

    Preview the issues currently affected by the Silence rule, based on any defined Matchers. This list contains only actively firing Issues.

    Tip: Us this preview to see the list of impacted issues and adjust your Matchers before finishing to create the Silence.

    Section 4: Comment

    Add notes or context for the Silence rule. These comments help you and other users understand the purpose of the rule.

    Dashboards

    Learn how to build custom dashboards using groundcover

    groundcover’s dashboards are designed to personalize your data visualization and maximize the value of your existing data. Dashboards are perfect for creating investigation flows for critical monitors, displaying the data you care about in a way that suits you and your team, and crafting insights from the data on groundcover.

    Easily create a new Dashboard using our guide.

    Example of a groundcover Dashboard

    Key Features

    • Multi-Mode Query Bar: The Query Bar is central to dashboards and supports multiple modes fully integrated with native pages and Monitors. Currently, the modes include Metrics, Infra Metrics, Logs, and Traces. Learn more in the .

    • Variables: Built-in variables allow you to filter data quickly based on a predefined list crafted by groundcover.

    • Widget Types: Two widget types are currently supported:

      • Chart Widget: Displays data visually.

      • Textual Widget: Adds context to your dashboards.

    • Display Types: Five display types are supported for data visualization:

      • Time Series, Table, Stat, Top List, and Pie. Read more in the

    Traces Pipeline Examples

    We strongly advise reading the intro guide to working with remap transforms in order to fully understand the functionalities of writing pipelines steps.

    Filtering out specific resources

    The following example will filter out all HTTP traces that include the /health URI. Note that the filter transform works by setting up an allow condition - meaning, events which fail the condition will be dropped.

    The filter below implements this logic:

    1. If the event isn't an HTTP event, allow it

    2. If the event is an HTTP event, and the resource name doesn't contain "/health", allow it

    3. If the event is an HTTP event AND it has "/health" in the resource name, drop it

    We are using the abort error handling below when calling the string function. If the protocol type or resource name aren't valid strings, we drop the event.

    Redact payloads from a specific server

    The following example will obfuscate response payloads from a specific server. This can be useful when you want to completely redact responses that contain sensitive data, such as secrets managed by an external server.

    Redact payloads based on a header value

    In this example we redact HTTP payloads for any request containing the host: frontend header.

    Monitors

    Monitors offers the ability to define custom alerts, which you can configure using groundcover data and custom metrics.

    What is a Monitor

    A Monitor defines a set of rules and conditions that track the state of your system. When a monitor's conditions are met, it triggers an issue that is displayed on the Issues page and can be used for alerting using your integrations and workflows.

    Easily create a new Monitor by using our guide.

    Example of a groundcover Monitor

    Connect Linux hosts

    Linux hosts sensor

    Note: Linux host sensor is currently available exclusively to Enterprise users. Check out our for more information about subscription plans.

    Supported Environments

    We currently support running on eBPF-enabled linux machines (See more

    Migrating from Issues to Monitors Issues Page

    The legacy Issues page is being deprecated in favor of a fully customizable, monitor-based experience that gives you more control over what constitutes an issue in your environment.

    While the new page introduces powerful capabilities, no core functionality is being removed, the key change is that the old auto-created issue rules will no longer be automatically generated. Instead, you’ll define your own monitors, or choose from a rich catalog of prebuilt ones.

    All the existing issues in the legacy page can be easily added to the monitors via the Monitors Catalog's "Started Pack". See below for more info.

    Why migrate to the new Issues experience?

    The new Issues page is built on top of the Monitors engine, enabling deeper customization and automation:

    Login and Create a Workspace

    Get started with groundcover

    This is the first step to start with groundcover for all types of plans 🚀

    Sign up to groundcover

    The first thing you need to do to start using groundcover is using your email address (no credit card required for the free tier account). Signing up is only possible using a computer and will not be possible using a mobile phone or tablet. It is highly recommended you use your corporate email address, as it will make it easier to use other features such as inviting your colleagues to your workspace. However, signing up using Gmail, Outlook or any other public domains is also possible.

    API Keys

    An API key in groundcover provides secure, programmatic access to the API on behalf of a service account. It inherits that account’s permissions and should be stored safely.

    An API key in groundcover provides secure, programmatic access to the API on behalf of a service account. It inherits that account’s permissions and should be stored safely

    Binding and Permissions

    Each API key is tied to a specific service account. It inherits the permissions defined by that account’s RBAC policies. Optionally, the key can be limited to a subset of those policies for more granular access control. An API key can never exceed the permissions of its parent service account.

    Getting-started Prompts

    Once your MCP server is connected, you can dive right in.

    Here are a few prompts to try. They work out of the box with agents like Cursor, Claude, or VS Code:

    💡 Starting your request with “Use groundcover” is a helpful nudge - it pushes the agent toward MCP tools and context.

    Basic Prompts to Try

    MCP supports complex, multi-step flows, but starting simple is the best way to ramp up.

    Issues page

    View and analyze monitor issues with detailed timelines, metadata, and context to quickly identify and resolve problems in your environment.

    The Issues page provides a detailed view of active and resolved issues triggered by Monitors. This page helps users investigate, analyze, and resolve problems in their environment by visualizing issue trends and providing in-depth context through an issue drawer.

    Issues drawer

    Clicking on an issue in the Issues List opens the Issue drawer, which provides an in-depth view of the Monitor and its triggered issue. You can also navigate if possible to related entities like workload, node, pod, etc.

    Datasource API Keys

    groundcover provides a robust user interface that allows you to view and analyze all your observability data from inside the platform. However, there may be cases in which you need to query the data from outside our platform using API communication.

    Our proprietary eBPF sensor automatically captures granular observability data, which is stored via our integrations with two best-of-breed technologies. VictoriaMetrics for metrics storage, and ClickHouse for storage of logs, traces, and Kubernetes events.

    Read more about our architecture .

    Generate the API key

    Run the following command in your CLI, and select tenant:

    groundcover auth get-datasources-api-key

    Grafana Service Account Token

    Step 1 - generate Grafana Service Account Token

    • Make sure you have

    Logs to Events Pipeline Examples

    We strongly advise reading the in order to fully understand the functionalities of writing pipelines steps.

    The generated events will currently only be available by querying the ClickHouse database directly. for additional information.

    Workflows

    Workflows are YAML-based configurations that are executed whenever a monitor is triggered. They enable you to integrate with third-party systems, apply custom logic to format and transform data, and set different conditions to handle your monitoring alerts intelligently.

    Workflow components

    Triggers

    Create a new Workflow

    Creation

    Creating new workflows is currently supported through the groundcover app in two ways from the Monitors menu:

    Remote Access & APIs

    groundcover has various authentication key types for remotely interacting with our platform, whether to ingest observability data or to automate actions via our APIs:

    1. - An API key in groundcover provides secure, programmatic access to the API on behalf of a . It inherits that account’s permissions and should be stored safely. This is also the key you need when working groundcover’s terraform provider. See:

      1. groundcover's .

      2. groundcover's Terraform provider: .

    Installation & Updating

    Multiple ways to connect your infrastructure and applications to groundcover

    groundcover is designed to support data ingestion from multiple sources, giving you comprehensive observability across your entire stack. Choose the installation method that best fits your infrastructure and monitoring needs.

    Available Installation Options

    Alert Structure

    Fields description in the alert you can use in your workflows

    Structure

    Field Name
    Description
    Example
    Kubernetes Clusters

    Connect your Kubernetes clusters using groundcover's eBPF-based sensor for automatic instrumentation and deep observability.

    • Connect Kubernetes clusters - Deploy groundcover's sensor to monitor containerized workloads, infrastructure, and applications with zero code changes

    Standalone Linux Hosts

    Monitor individual Linux servers, virtual machines, or cloud instances outside of Kubernetes.

    • Connect Linux hosts - Install groundcover on standalone Linux hosts such as EC2 instances, bare metal servers, or VMs

    Real User Monitoring (RUM)

    Gain visibility into frontend performance and user experience with client-side monitoring.

    • Connect RUM - Monitor real user interactions, page loads, and frontend performance in web applications

    External Data Sources

    Integrate with existing observability tools and send data from your current monitoring stack.

    • Ship from OpenTelemetry - Forward traces, metrics, and logs from existing OpenTelemetry collectors

    • Ship from Datadog Agent - Send data from Datadog agents while maintaining your existing setup

    Getting Started

    1. Login and Create a Workspace - Set up your groundcover account and workspace

    2. Review Requirements - Ensure your environment meets the necessary prerequisites

    3. Choose your installation method - Select the option that matches your infrastructure setup

    4. Follow the 5 quick steps - Get oriented with groundcover's interface and features

    Need Help?

    If you're unsure which installation method is right for you, or if you have specific requirements, check our FAQ or reach out to our support team.

    Ingestion Keys- Ingestion Keys let sensors, integrations and browsers send observability data to your groundcover backend. These keys are the counterpart of API Keys, which are optimized for reading data or automating dashboards and monitors.

  • Datasources (ds) API Key- A key used to connect to groundcover as a datasource, querying Clickhouse and VictoriaMetrics directly.

  • Grafana Service Account Token- Used to remotely configure create Grafana Alerts & Dashboards via Terraform.

  • \

    API Keys
    service account
    APIs documentation
    https://github.com/groundcover-com/terraform-provider-groundcover
    Details tab

    Displays metadata about the issue, including:

    • Monitor Name: Name of the Monitor that triggered the issue, including a link to it.

    • Description: Explains what the Monitor tracks and why it was triggered.

    • Severity: Shows the assigned severity level (e.g., S3).

    • Labels: Lists contextual labels like cluster, namespace, and workload.

    • Creation Time: Shows when the issue started firing.

    Events tab

    Displays the Kubernetes events related to the selected issue within the timeframe selected in the Time Picker dropdown (upper right of the issue drawer).

    Traces tab

    When creating a Monitor using a traces query, the Traces tab will display the matching traces generated within the timeframe selected in the Time Picker dropdown (upper right of the issue drawer). Click on "View in Traces" to navigate to the Traces section with all relevant filters automatically applied.

    Logs tab

    When creating a monitor using a log query, the Logs tab will display the matching logs generated within the timeframe selected in the Time Picker dropdown (upper right of the issue drawer). Click on "View in Logs" to navigate to the Logs section with all relevant filters automatically applied.

    Map tab

    A focused visualization of the interactions between workloads related to the selected issue.

    1. "Create Notification Workflow" button - The quick way

    This provides a guided approach to create a workflow. When creating a notification workflow, you will be asked to give your workflow a name, add filters, and select the specific integration to use.

    Filter Rules By Labels - Add key-value attributes to ensure your workflow executes under specific conditions only - for example, env = prod only.

    Delivery Destinations - Select one or more integrations to be used for notifications with this workflow.

    Scope - When The Workflow Will Run - This setting allows you to limit this workflow execution only to monitors that explicitly select to route their triggers to this workflow, as opposed to "Handle all issues" that catches triggers from any monitor.

    Once you create a workflow using this option, you can later edit the workflow to apply any configuration or logic by using the editor option (see next).

    2. "Create Workflow" button

    Clicking the button will open up a text editor where you can add your workflow definition in YAML format by applying any valid configuration, logic, and functionality.

    Note: Make sure to create your integration prior to creating the workflow as it requires using an existing integration.

    View

    Upon successful workflow creation it will be active immediately, and a new workflow record will appear in the underlying table.

    For each existing workflow, we can see the following fields:

    • Name: Your defined workflow name

    • Description: If you've added a description of the workflow

    • Creator: Workflow creator email

    • Creation Date: Workflow creation date

    • Last Execution Time: Timestamp of last workflow execution (depends on workflow trigger type)

    • Last Execution Status: Last execution status (failure or success)

    Editing

    From the right side of each workflow record in the display, you can access the menu (three dots) and click "Edit Workflow". This will open the editor so you can modify the YAML to conform to the available functionality. See examples below.

    Create Workflow
    Gauge

    groundcover_container_memory_rss_bytes

    current memory RSS (B)

    Gauge

    groundcover_container_memory_request_bytes

    K8s container memory request (B)

    Gauge

    groundcover_container_memory_limit_bytes

    K8s container memory limit (B)

    Gauge

    groundcover_container_cpu_delay_seconds

    K8s container CPU delay accounting in seconds

    Counter

    groundcover_container_disk_delay_seconds

    K8s container disk delay accounting in seconds

    Counter

    groundcover_container_cpu_throttled_seconds_total

    K8s container total CPU throttling in seconds

    Counter
    Gauge

    groundcover_node_free_disk_space

    amount of free disk space in current node (B)

    Gauge

    groundcover_node_total_disk_space

    amount of total disk space in current node (B)

    Gauge

    groundcover_node_used_percent_disk_space

    percent of used disk space in current node (0-100)

    Gauge
    Gauge

    groundcover_network_connections_opened_failed_total

    Connections attempts failed per workload (including refused connections)

    Counter

    groundcover_network_connections_opened_refused_total

    Connections attempts refused per workload

    Counter
    Query Builder section
    Widget Types section.

    Authorization

    Bearer <YOUR_API_KEY>

    Your groundcover API key

    Accept

    */*

    Accept any response format

    id

    string

    The UUID of the workflow to delete

    Querying ClickHouse

    Example for querying ClickHouse database using POST HTTP Request:

    Command parameters

    • X-ClickHouse-Key (header): API Key you retrieved from the groundcover CLI. Replace ${API_KEY} with your actual API key, or set API_KEY as env parameter.

    • SELECT count() FROM traces WHERE start_timestamp > now() - interval '15 minutes' (data): The SQL query to execute. This query counts the number of traces where the start_timestamp is within the last 15 minutes.

    Learn more about the ClickHouse query language here.

    Querying VictoriaMetrics

    Example for querying the VictoriaMetrics database using the query_range API:

    Command parameters

    • apikey (header): API Key you retrieved from the groundcover CLI. Replace ${API_KEY} with your actual API key, or set API_KEY as env parameter.

    • query (data): The promql query to execute. In this case, it calculates the sum of the rate of groundcover_resource_total_counter with the type set to http.

    • start (data): The start timestamp for the query range in Unix time (seconds since epoch). Example: 1715760000.

    • end (data): The end timestamp for the query range in Unix time (seconds since epoch). Example: 1715763600.

    Learn more about the promql syntax here.

    Learn more about VictoriaMetrics HTTP API here.

    here
    Generate service account token

    Service Account Token are only accessible once, so make sure you keep them somewhere safe, running the command again will generate a new service account token

    Only groundcover tenant admins can generate Service Account Tokens

    Step 2 - Use Grafana Terraform provider

    • make sure you have Terraform installed

    • Use the official Grafana Terraform provider with the following attributes

    Continue to create Alerts and Dashboards in Grafana, see: Build alerts & dashboards with Grafana Terraform provider.

    You can read more about what you can achieve with the Grafana Terraform provider in the official docs

    groundcover cli installed
    Detecting a pattern and extracting data

    Attributes parsed from logs or traces can be accessed under the .string_attributes or .float_attributes maps - see here for more information.

    The following example demonstrates transformation of a log in a specific format to an event, while applying additional filtering and extraction logic.

    In this example, we want to create events for when a user consistently fails to login to a system. We base it on logs with this specific format:

    This pipeline will create events with the type multiple_login_failures for each time a user fails to login for the 5th time or more . It will store the username in .string_attributes and the attempt number in .float_attributes.

    intro guide to working with remap transforms
    Contact us over Slack
    login failed for user <username>, attempt number <number>
    Triggers apply filtering conditions that determine whether a specific workflow is executed. In groundcover, the trigger type is always set to "alert".

    Example: This trigger ensures that only monitors fired with telemetry data from the Prod environment will actually execute the workflow. Note that the "env" attribute needs to be provided as a context label from the monitor:

    Note: Workflows are "pull based" which means they will try to match monitors even when these monitors did not explicitly add a specific workflow. Therefore, the filters need to accurately define the condition to be used for a monitor.

    Consts

    Consts is a section where you can declare predefined attributes based on data provided with the monitor context. A set of functions is available to transform existing data and format it for propagation to third-party integrations. Consts simplify access to data that is needed in the actions section.

    Example: The following example shows how to map a predefined set of severity values to the monitor severity as defined in groundcover - here, any potential severity in groundcover is translated into one of P1-P5 values.

    The function keep.dictget gets a value from a map (dictionary) using a specific key. In case the key is not found - P3 will be the default severity:

    Actions

    Actions specify what happens when a workflow is triggered. Actions typically interface with external systems (like sending a Slack message). Actions can be an array of actions, they can be executed conditionally and include the integration in their config part as well as a payload block which is typically dependent on the exact integration used for the notification.

    Actions include:

    1. Provider part (provider:) - Configures the integration to be used

    2. Payload part (with:) - Contains the data to submit to the integration based on its actual structure

    Example: In this example you can see a typical Slack notification. Note that the actual integration is referenced through the 'providers' context attribute. The integration name is the exact string used to create the integration (in this case "groundcover-alerts").

    status

    Current status of the alert

    • firing - Active alert indicating an ongoing issue.

    • resolved - The issue has been resolved, and the alert is no longer active.

    • suppressed - Alert is suppressed.

    • pending - No Data or insufficient data to determine the alert state.

    lastReceived

    Timestamp when the alert was last received

    This alert timestamp

    firingStartTime

    Start time of the firing alert

    First timestamp of the current firing state.

    source

    Sources generating the alert

    grafana

    fingerprint

    Unique fingerprint of the alert, this is a hash of the labels

    02f5568d4c4b5b7f

    alertname

    Name of the monitor

    Workload Pods Crashed Monitor

    _gc_severity

    The defined severity of the alert

    S3, error

    trigger

    Trigger condition of the workflow

    alert / manual / interval

    values

    A map containing two values that can be used:

    The numeric value that triggered the alert (threshold_input_query) and the actual threshold that was defined for the alert (threshold_1)

    "values": { "threshold_1": 0, "threshold_input_query": 99.507}

    Usage

    When crafting your workflows you can use any of the fields above using templating in any workflow field. Encapsulate you fields using double opening and closing curly brackets.

    Examples

    Using Label Values

    You can access label values by alert.labels.*

    labels

    Map of key:values derived from monitor definition.

    { "workload": "frontend",

    "namespace": "prod" }

    curl -L \
      --request DELETE \
      --url 'https://api.groundcover.com/api/workflows/{id}' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Accept: */*'
    {
      "message": "OK"
    }
    vector:
      tracesPipeline:
        extraSteps: 
        - name: filterHealthAPIs
          transform:
            type: filter
            condition: |-
              string!(.protocol_type) != "http" || !contains(string!(.resource_name), "/health")
    vector:
      tracesPipeline:
        extraSteps: 
        - name: obfuscateSecretServerResponses
          transform:
            type: remap
            source: |-
              if exists(.server) && string!(.server) == "my-secret-server" {
                .response_body = "<REDACTED>"
                .response_body_truncated = true
              }
    vector:
      tracesPipeline:
        extraSteps: 
        - name: obfuscateBasedOnHeader
          transform:
            type: remap
            source: |-
              if exists(.http_request_headers.host) && string!(.http_request_headers.host) == "frontend" {
                .request_body = "<REDACTED>"
                .response_body = "<REDACTED>"
              }
    curl 'https://ds.groundcover.com/' \
            --header "X-ClickHouse-Key: ${API_KEY}" \
            --data "SELECT count() from traces where start_timestamp > now() - interval '15 minutes' "
    curl 'https://ds.groundcover.com/datasources/prometheus/api/v1/query_range' \
        --get \
        --header "apikey: ${API_KEY}" \
        --data 'query=sum(rate(groundcover_resource_total_counter{type="http"}))' \
        --data 'start=1715760000' \
        --data 'end=1715763600'
    groundcover version
    groundcover auth generate-service-account-token
    terraform {
      required_providers {
        grafana = {
          source = "grafana/grafana"
        }
      }
    }
    
    provider "grafana" {
      url  = "https://app.groundcover.com/grafana"
      auth = "{service account token}"
    }
    vector:
      eventsPipelines:
        multiple_login_failures:
          inputs:
            - logs_from_logs
            - json_logs
          extraSteps:
            - name: multiple_login_failures_filter
              transform:
                type: filter
                condition: |
                  .container_name == "loginservice" && contains(string!(.content), "login failed for user")
            - name: multiple_login_failures_extract
              transform:
                type: "remap"
                source: |
                  regex_result = parse_regex!(string!(.content), r'login failed for user (?P<username>.*) attempt number (?P<attempt_number>[0-9.]+)')
                  if to_int!(regex_result.attempt_number) < 5 {
                    abort
                  }
                  .float_attributes = object!(.float_attributes)
                  .float_attributes.attempt_number = to_int!(regex_result.attempt_number)
                  .string_attributes = object!(.string_attributes)
                  .string_attributes.username = regex_result.username
    triggers:
      - type: alert
        filters:
        - key: env
          value: prod
    consts:
        severities: '{"S1": "P1","S2": "P2","S3": "P3","S4": "P4","critical": "P1","error": "P2","warning": "P3","info": "P4"}'
        severity: keep.dictget({{ consts.severities }}, {{ alert.annotations._gc_severity }}, "P3")
    actions:
    - name: slack-action-firing
      provider:
        config: '{{ providers.groundcover-alerts }}'
        type: slack
        with:
          attachments:
          - color: '{{ consts.red_color }}'
            footer: '{{ consts.footer_url }}'
            text: '{{ consts.slack_message }}'
            title: 'Firing: {{ alert.alertname }}'
            type: plain_text
          message: ' '
    message: "Pod Crashed - Pod: {{ alert.labels.pod_name }} Namespace: {{ alert.labels.namespace }}"
    dashboard that will be created
    to execute the changes, you should now see a new folder in your grafana dashboards screen with the newly created dashboard
  • Run terraform destroy to revert the changes

  • 30KB
    workloadgoldensignals.json
    Open
    official docs
    )

    Supported architectures: AMD64 + ARM64

    For the following providers, we will fetch the machine metadata from the provider's API.

    Provider
    Supported

    AWS

    ✅

    GCP

    ✅

    Azure

    ✅

    Linode

    ✅

    Sensor capabilities

    • Infrastructure Host metrics: CPU/Memory/Disk usage

    • Logs

      • Natively from docker containers running on the machine

      • JournalD (requires configuration)

      • Static log files on the machine ()

    • Traces

      • Natively from docker containers running on the machine

    • APM metrics and insights from the traces

    How to install?

    Installation currently requires running a script on the machine.

    The script will pull the latest sensor version and install it as a service: groundcover-sensor (requires privileges)

    Install/Upgrade existing sensor:

    Where:

    • {ingestion_Key} - A dedicated ingestion key, you can generate or find existing ones from Settings -> Access -> Ingestion Keys

      • Ingestion Key needs to be of Type Sensor

    • {inCloud_Site} - Your backend ingress address (Your inCloud public ingestion endpoint)

    • {selected_Env} - The Environment that will group those machines on the cluster drop down in the top right corner (We recommend setting a separate one for non k8s deployments)

    Check installed sensor status:

    • Check service status: systemctl status groundcover-sensor

    • View sensor logs: journalctl -u groundcover-sensor

    Initial data may take a few minutes to appear in the app after installation

    Remove installed sensor:

    Customize sensor configuration:

    The sensor supports overriding its default configuration by writing to the file is located in:

    /etc/opt/groundcover/overrides.yaml. After writing it you should restart the sensor service using:

    systemctl restart groundcover-sensor

    Example - override Docker max log line size:

    pricing page
    Kernel requirements for eBPF sensor
  • Define what qualifies as an issue

    Use filters in monitor definitions to include or exclude workloads, namespaces, HTTP status codes, clusters, and more tailor it to your context.

  • Silence issues with precision

    Silence issues based on any label, such as status_code, cluster, or workload, to reduce noise and keep focus.

  • Clean, scoped issue view

    Only see issues relevant to your environment, based on your configured monitors and silencing rules, no clutter.

  • Get alerted on new issues

    Trigger alerts through your preferred integrations (Slack, PagerDuty, Webhooks, etc.) when a new issue is detected.

  • Define custom issues using all your data

    Build monitors using metrics, traces, logs, and events, and correlate them to uncover complex problems.

  • Manage everything as code

    Use Terraform to manage monitors and issues at scale, ensuring consistency and auditability.

  • What’s Changing?

    Aspect
    Legacy Issues Page
    New Issues Page

    Issue Source

    Auto-created rules

    User-defined monitors

    Custom Filtering

    ❌

    ✅

    Silencing by Labels

    ❌

    ✅

    Alerts via Integrations

    ❌

    Getting Started

    All the built-in rules you’re used to are already available in the Monitors Catalog, you can add them all with a single click.

    Adding the monitors in the "Started Pack" will match all the existing issues in the legacy page.

    Head to:

    Monitors → Create Monitor -> Monitor Catalog → Recommended Monitors

    Only users with Editor/Admin roles can create monitors

    Learn More

    • Create a new Monitor

    • Issues page

    • Silences page

    Getting Started

    Workspace Selection

    When signing in to groundcover for the first time, the platform automatically detects your organization based on the domain you used to sign in. If your organization already has existing workspaces available, the workspace selection screen will be displayed, where you can choose which of the existing workspaces you would like to join, or if you want to create a new workspace.

    Available workspaces will be displayed only if either of the following applies:

    • You have been invited to join existing workspaces and haven't joined them yet

    • Someone has previously created a workspace that has auto-join enabled for the email domain that you used to sign in (applicable for corporate email domains only)

    To join an existing workspace:

    1. Click the Join button next to the desired workspace

    2. You will be added as a user to that workspace with the user privileges that were assigned by default or those that were assigned to you specifically when the invite was sent.

    3. You will automatically be redirected to that workspace.

    To create a new workspace:

    You will only see the option to create a new workspace if you are the first person from your organization to join groundcover.

    1. Click the Create a new workspace button

    2. Specify a workspace name

    3. Choose whether to enable auto-join (those settings can be changed later)

    4. Click continue

    Workspace Auto-joining

    Workspace owners and admins can allow teammates that log in with the same email domain as them to join the Workspace they created automatically, without an admin approval. This capability is called "Auto-join". It is disabled by default, but can be switched on during the workspace set up process, or any time in the workspace settings.

    If you logged in with a public email domain (Gmail, Yahoo, Proton, etc.) and are creating a new Workspace, you will not be able to switch on Auto-join for that Workspace.

    sign up
    Creation and Storage

    Only Admins can create or revoke API keys. To create an API key:

    1. Navigate to the Settings page using the settings button located in the bottom left corner

    2. Select "Access" from the sidebar menu

    3. Click on the "API Keys" tab

    4. Create a new API Key, ensuring you assign it to a service account that is bound to the appropriate RBAC policy

    When a key is created, its value is shown once—store it securely in a secret manager or encrypted environment variable. If lost, a new key must be issued.

    Authentication and Usage

    To use an API key, send it in the Authorization header as bearer token:

    The key authenticates as the service account, and all API permissions are enforced accordingly.

    API Key authentication will work using https://api.groundcover.com/ only.

    Validity and Revocation

    API keys do not expire automatically. Revoking a key immediately disables its access.

    Scope of Use

    API keys are valid only for requests to https://api.groundcover.com. They do not support data ingestion or Grafana integration—those require dedicated tokens.

    API Keys vs Ingestion Keys

    Ingestion Key

    API Key

    Primary purpose

    Write data (ingest)

    Read data / manage resources via REST

    Permissions capabilities

    Write‑only + optional remote‑config read

    Mirrors service‑account RBAC

    Visibility after creation

    Always revealable

    Shown once only

    Typical lifetime

    Tied to integration lifecycle

    Rotated for CI/CD automations

    Security Best Practices

    Store securely: Use secrets managers like AWS Secrets Manager or HashiCorp Vault. Never commit keys to source control.

    Follow least privilege: Assign the minimal required policies to service accounts and API keys. Avoid defaulting to admin-level access.

    Rotate regularly: Periodically generate new keys, update your systems, and revoke old ones to limit exposure.

    Revoke stale keys: Remove keys that are no longer in use or suspected to be compromised.

    Pull Logs

    Prompt:

    Expected behavior: The agent should call query_logs and show recent logs for that workload.

    Get K8s Resource Specs

    Prompt:

    Expected behavior: The agent should call get_k8s_object_yaml and return the YAML or a summary of it.

    Find Slow Workloads

    Prompt:

    Expected behavior: The agent should call get_workloads and return the relevant workloads with their P95 latency.

    Investigate Issues

    When something breaks, your agent can help investigate and make sense of it.

    Paste an Issue Link

    Prompt:

    Expected behavior: The agent should use query_monitors_issues, pull issue details, and kick off a relevant investigation using logs, traces, and metadata.

    Investigate Multiple Issues

    Prompt:

    Expected behavior: The agent should use query_monitors_issues to pull all related issues and start going through them one by one.

    Automate Coding & Debugging

    groundcover’s MCP can also be your coding sidekick. Instead of digging through tests and logs manually, deploy your changes and let the agent take over.

    Iterate Over Test Results

    Prompt:

    Expected behavior: The agent should update the code with log statements, deploy it, and use query_logs to trace and debug.

    Deploy & Verify

    Prompt:

    Expected behavior: The agent should assist with deployment, then check for issues, error logs, and traces via groundcover.

    Kubernetes requirements

    Kubernetes version

    groundcover supports any K8s version from v1.21.

    groundcover may work on many other K8s flavors, but we might just didn't get a chance to test it yet. Can't find yours in the list? let us know over Slack.

    Kubernetes distributions

    K8s distribution
    Status
    Comments

    Kubernetes RBAC permissions

    For the installation to complete successfully, permissions to deploy the following objects are required:

    • StatefulSet

    • Deployment

    • DaemonSet (With privileged containers for loading our )

    • ConfigMap

    To learn more about groundcover's architecture and components visit our

    Outgoing traffic

    groundcover's portal pod sends HTTP requests to the cloud platform app.groundcover.com on port 443.

    This unique keeps the the data inside the cluster and fetches it on-demand keeping the data encrypted all the way without the need to open the cluster for incoming traffic via ingresses.

    Configure groundcover's MCP Server

    Set up your agent to talk to groundcover’s MCP server. Use OAuth for a quick login, or an API key for service accounts.

    The MCP server supports two methods:

    • OAuth (Recommended for IDEs)

    • API Key

    OAuth (recommended)

    OAuth is the default if your agent supports it.

    To get started, add the config below to your MCP client. Once you run it, your browser will open and prompt you to log in with your groundcover credentials.

    🧘 Pro tip: You can copy a ready-to-go command from the UI. Go to the sidebar → Click your profile picture → "Connect to our MCP"

    The tenant's UUID and your time zone will be auto-filled.

    Please make sure to have installed (to use npx)

    Configuration Example

    Cursor

    Claude Code

    API Key

    If your agent doesn’t support OAuth, or if you want to connect a service account, this is the way to go.

    Prerequisites

    1. Service‑account API key – create one or use an existing API Key. Learn more at .

    2. Your local time zone (IANA format, for example America/New_York or Asia/Jerusalem). See how to .

    Configuration Example

    Parameters

    • AUTH_HEADER – your groundcover API key.

    • TIMEZONE – your time zone in IANA format.

    Have a Multi‑backend setup?

    If you're using a multi-backend setup (OAuth or API Key), just add the following header to the args list:

    First, grab your backend ID (it’s basically the name):

    1. Open Data Explorer in groundcover.

    2. Click the Backend picker (top‑right) and copy the backend’s name.

    How to find your time zone

    OS
    command

    Client‑specific Guides

    Depending on your client, you can usually set up the MCP server through the UI - or just ask the client to add it for you. Here are quick links for common tools:

    Kernel requirements for eBPF sensor

    Intro

    groundcover’s eBPF sensor uses state-of-the-art kernel features to provide full coverage at low overhead. In order to do so it requires certain kernel features which are listed below.

    groundcover may work on many other linux kernels, but we might just didn't get a chance to test it yet. Can't find yours in the list? let us know over Slack.

    Kernel Version

    Version v5.3 or higher (anything since 2020).

    Linux Distributions

    Name
    Supported Versions

    Permissions

    Loading eBPF code requires running privileged containers. While this might seem unusual, there's nothing to worry about - eBPF is

    CO:RE support

    Our sensor uses eBPF’s feature in order to support the vast variety of linux kernels and distributions detailed above. This feature requires the kernel to be compiled with BTF information (enabled using the CONFIG_BTF_ENABLE=Y kernel compilation flag). This is the case for most common nowadays.

    You can check if your kernel has CO:RE support by manually looking for the BTF file:

    If the file exists, congratulations! Your kernel supported CO:RE.

    What happens if my kernel is not supported?

    If your system does not fit into any of the above - unfortunately, our eBPF sensor will not be able to run on your environment. However, this does not mean groundcover won’t collect any data. You will still be able to inspect your , see all and use with outer data sources.

    Email via Zapier

    This guide shows how to route groundcover alerts to email using Zapier. Since groundcover supports webhook-based alerting, and Zapier can receive webhooks and send emails, you can easily set up this workflow without writing code.


    Prerequisites

    • A groundcover account with access to the Workflows tab.

    • A Zapier account (free plan is sufficient).

    • An email address where you want to receive alerts.


    Step 1: Create a Webhook Integration in groundcover

    1. Go to Settings → Integrations.

    2. Click Create Integration.

    3. Choose Webhook as the type.

    4. Enter a name like zapier_email_integration.


    Step 2: Create a Zapier Webhook-to-Email Workflow

    Create a Webhook Trigger

    1. Go to .

    2. Click "Create Zap".

    3. Set Trigger:

      • App: Webhooks by Zapier


    Configure the Email Step

    1. Set Action:

      • App: Email by Zapier

      • Event: Send Outbound Email


    Step 3: Create a Workflow in groundcover

    1. Go to the Workflows section in your groundcover.

    2. Create a Notification Workflow with the integration we created in step 1.

    3. Edit the worflow YAML and use the following structure:


    Step 4: Test the Flow

    1. Trigger a test alert in groundcover.

    2. Check Zapier to ensure the webhook was received.

    3. Confirm the email arrives with the right content.

    List Namespaces

    Retrieve a list of Kubernetes namespaces within a specified time range.

    Endpoint

    Authentication

    This endpoint requires API Key authentication via the Authorization header.

    Headers

    Header
    Required
    Description

    Request Body

    Parameter
    Type
    Required
    Description

    Time Range Parameters

    • Format: ISO 8601 format with milliseconds: YYYY-MM-DDTHH:mm:ss.sssZ

    • Timezone: All timestamps must be in UTC (denoted by 'Z' suffix)

    Response

    The response contains an array of namespaces for the specified time period.

    Response Fields

    Field
    Type
    Description

    Examples

    Basic Request

    Response Example

    Time Range Usage

    Last 24 Hours

    Issues

    Quickly understand what requires your attention and drive your investigations

    The issues page is a useful place to start a troubleshooting or investigation flow from. It gathers together all active issues found in your Kubernetes environment.

    Issue Types

    • HTTP / gRPC Failures Capturing failed HTTP calls with Response Status Codes of:

      • 5XX — Internal Server Error

      • 429 — Too Many Requests

    • MySQL / PostgreSQL Failures

      Capturing failed SQL statement executions with Response Errors Codes such as:

      • 1146 — No Such Table

    • Redis Failures Capturing any reported Error by the Redis serialization protocol (RESP), such as:

      • ERR unknown command\

    • Container Restarts Capturing all container restart events across the cluster, with Exit Codes such as:

      • 0 — Completed

      • 137 — OOMKilled\

    • Deployment Failures

      Capturing events such as:

      • MinimumReplicasUnavailable — Deployment does not have minimum availabiltiy

    Issue Aggregation

    Issues are auto-detected and aggregated - representing many identical repeating incidents. Aggregation help cutting through the noise quickly and reach insights like when a new type of issue started to appear, and when it was last seen.

    Issues are grouped by:

    • Type (HTTP, gRPC, Container Restart, etc..)

    • Status Code / Error Code (e.g HTTP 500, gRPC 13)

    • Workload name

    The smart aggregation mechanism will also identify query parameters, remove them, and group the stripped queries / API URIs into patterns. This allows users to easily identify and isolate the root cause of a problem.

    Troubleshooting with Issues

    Each issue is assigned a velocity graph showing it's behavior over time (like when it was first seen) and a live counter of its number of incidents.

    By clicking on an issue, users can access the specific traces captured around the relevant issue. Each trace is related to the exact resource that was used (e.g. raw API URI, or SQL query), it's latency and Status Code / Error Code.

    Further clicking on a selected captured trace allows the user to investigate the root cause of the issue with the entire payload (body and headers) of the request and response, the information around the participating container, the application logs around incident's time and the full context of the metrics around the incident.

    Writing Remap Transforms

    Below are the essentials relevant to writing remap transforms in groundcover. Extended information can be found in Vector's documentation.

    We support using all types of Vector transforms as pipeline steps.

    For testing VRL before deployment we recommend the VRL playground.

    Working with fields and attributes

    When processing Vector events, fields names need to be prefixed by . , a single period. For example, the content field in a log, representing the body of the log, is accessible using .content. Specifically in groundcover, attributes parsed from logs or associated with traces will be stored under the string_attributes for string values, and under float_attributes for numerical values. Accessing attributes is possible by adding additional . as needed. For example, a JSON log that looks like this:

    Will be translated into an event with the following attributes:

    Fallible functions

    Each of Vector's built-in function can be either fallible or infallible. Fallible functions can throw an error when called, and require error handling, whereas infallible functions will never throw an error.

    When writing Vector transforms in VRL it's important to use error handling where needed. Below are the two ways error handling in Vector is possible - see more on .

    VRL code without proper error handling will throw an error during compilation, resulting in error logs in the Vector deployment.

    Option #1 - Handling the error

    Let's take a look at the following code.

    The code above can either succeed in parsing the json, or fail in parsing it. The err variable will contain indication of the result status, and we can proceed accordingly.

    Option #2 - Aborting on error

    Let's take a look at this slightly different version of the code above:

    This time there's no error handling around, but ! was added after the function call.

    This method of error handling is called abort on error - it will fail the transform entirely if the function returns an error, and proceed normally otherwise.

    Choosing which method to use

    Both methods above are valid VRL for handling errors, and you must choose one or the other when handling fallible functions. However, they carry one big difference in terms of pipelines in groundcover:

    • Transforms which use option #1 (error handling) will not stop the pipeline in case of error - the following steps will continue to execute normally. This is useful when writing optional enrichment steps that could potentially fail with no issue.

    • Transforms which use option #2 (abort) will stop the pipeline in case of error - the event will not proceed to the other steps. This is mostly useful for mandatory steps which can't fail no matter what.

    The default behavior above can be changed using the flag. When this flag is set to false, errors encountered will never stop the pipeline - both for method #1 and for method #2.

    This is useful for writing simpler code with less explicit error handling, as can be seen in this .

    Update Logs Pipeline Configuration

    Update the logs pipeline configuration.

    Endpoint

    POST /api/pipelines/logs/config

    Authentication

    This endpoint requires API Key authentication via the Authorization header.

    Headers

    Examples

    Basic Request

    Update logs pipeline configuration with test pattern:

    Response Example

    🚨 CRITICAL WARNING: This endpoint COMPLETELY REPLACES the entire pipeline configuration and WILL DELETE ALL EXISTING RULES. Always backup your current configuration first by calling the GET endpoint.

    Backup Current Configuration First

    ALWAYS get your current configuration before making changes:

    Verify Configuration Update

    After updating the configuration, verify the patterns were added:

    This should return your updated configuration including the new test patterns.

    Related Documentation

    For detailed information about configuring and writing OTTL transformations, see:

    Get Logs Pipeline Configuration

    Retrieve the current logs pipeline configuration.

    Endpoint

    GET /api/pipelines/logs/config

    Authentication

    This endpoint requires API Key authentication via the Authorization header.

    Headers

    Examples

    Basic Request

    Get current logs pipeline configuration:

    Response Example

    Related Documentation

    For detailed information about configuring and writing OTTL transformations, see:

    Managing Dashboards with Terraform

    Create & manage dashboards with Terraform

    Use Terraform to create, update, delete, and list groundcover dashboards as code. Managing dashboards with infrastructure‑as‑code (IaC) lets you version changes, review them in pull requests, promote the same definitions across environments, and detect drift between what’s applied and what’s running in your account.


    Ingestion Keys

    Secure, write‑focused credentials for streaming data into groundcover

    Ingestion Keys let sensors, integrations and browsers send observability data to your groundcover backend. They are the counterpart of API Keys, which are optimized for reading data or automating dashboards and monitors.


    Key types

    Delete Ingestion Key

    Delete an existing ingestion key. This operation permanently removes the key and cannot be undone.

    Endpoint

    DELETE /api/rbac/ingestion-keys/delete

    MS Teams

    To integrate groundcover with MS Teams, follow the steps below. Note that you’ll need at least a Business subscription of MS Teams to be able to create workflows.

    1. Create a webhook workflow for your dedicated Teams channel Go to Relevant Team -> Specific Channel -> "Workflows", and create a webhook workflow

    2. The webhook workflow is associated a URL which is used to trigger the MS Teams integration on groundcover - make sure to copy this URL

    3. Set Up the Webhook in groundcover

    Connect RUM

    This capability is only available to organizations subscribed to our .

    groundcover’s Real User Monitoring (RUM) SDK allows you to capture front-end performance data, user events, and errors from your web applications.

    Start capturing RUM data by installing the in your web app.

    This guide will walk you through installing the SDK, initializing it, identifying users, sending custom events, capturing exceptions, and configuring optional settings.

    incident.io

    To integrate groundcover with , follow the steps below. Note that you’ll need a Pro incident.io account to view your incoming alerts.

    1. Generate an Alerts configuration for groundcover Log in to your account. Go to "On-call" -> "Alerts" -> "Configure" and add a new source.

    2. On the "Create Alert Source" screen the answer to the question "Where do your alerts come from?" should be "HTTP". Select this source and give it a unique name. Hit "continue".

    Slack App for Channel Routing

    groundcover supports sending notifications to Slack using a Slack App with bot tokens instead of static webhooks. This method allows dynamic routing of alerts to any channel by including the channel ID in the payload. In addition to routing, messages can be enriched with formatting, blocks, and mentions — for example including <@user_id> in the payload to directly notify specific team members. This provides a flexible and powerful alternative to fixed incoming webhooks for alerting.

    Make sure you created a .

    Use the following workflow as an example. You can later enrich your workflow with additional functionality.

    Here are a few tips for using the example workflow:

    1. In the consts section, the channels attribute defines the mapping between Slack channels and their IDs. Use a clear, readable label to identify each channel (for example, the channel’s actual name in Slack), and map it to the corresponding channel ID.

    groundcover Terraform Provider

    Overview

    Terraform is an infrastructure-as-code (IaC) tool for managing cloud and SaaS resources using declarative configuration. The groundcover Terraform provider enables you to manage observability resources such as policies, service accounts, API keys, and monitors as code—making them consistent, versioned, and automated.

    We've partnered up with Hashicorp as an official Terraform provider-

    Also available is our provider Github repository:

    Backup & Restore Metrics

    Learn how to backup and restore metrics into groundcover metrics storage

    groundcover uses as its underlying metrics storage solution. As such, groundcover integrates seamlessly with VictoriaMetrics and tools.

    Doing incremental backups

    Using groundcover as Prometheus/Clickhouse database in a Self-hosted Grafana

    Exposing Data Sources for Managed inCloud Setup

    groundcover exposes data sources for programmatic access via API, and integration with customer owned Grafana instances.

    Different steps are required for deployments, contact us for additional info.

    Saved Views

    Save the view of any groundcover page exactly the way you like it, then jump back in a click.

    A Saved View captures your current page layout: filters, columns, toggles, etc.. so you and your team can reopen the page with the same context every time. Each groundcover page maintains its own catalogue of views, and every user can pick their personal Favorites.


    Where to find them

    On the pages: Traces, Logs, API Catalog, Events.

    Look for the Views selector next to the time‑picker. Click it to open the list, create a new view, or switch between existing ones.


    Real User Monitoring (RUM)

    Monitor front-end applications and connect it to your backend — all inside your cloud.

    This capability is only available to organizations subscribed to our .

    Capture real end-user experiences directly from their browsers and unify these insights with your backend observability data.

    ➡️ Check out our to your platform.

    Configuring Pipelines

    groundcover supports the configuration of logs and traces pipelines, to further process and customize the data being collected, using transforms. This enables full flexibility to manipulate the data as it flows into the platform.

    See for more information about how Vector is being used in the groundcover platform's architecture.

    mkdir groundcover-tf-example && cd groundcover-tf-example
    resource "grafana_folder" "goldensignals" {
      title = "Golden Signals"
    }
    
    resource "grafana_dashboard" "workloadgoldensignals" {
      config_json = file("workloadgoldensignals.json")
      folder = grafana_folder.goldensignals.id
    }
    curl -fsSL https://groundcover.com/install-groundcover-sensor.sh | sudo env API_KEY='{ingestion_Key}' GC_ENV_NAME='{selected_Env}' GC_DOMAIN='{inCloud_Site}' bash -s -- install
    curl -fsSL https://groundcover.com/install-groundcover-sensor.sh | sudo bash -s -- uninstall
    echo "# Local overrides to sensor configuration
    k8sLogs:
      scraper:
        dockerMaxLogSize: 102400
    " | sudo tee /etc/opt/groundcover/overrides.yaml && sudo systemctl restart groundcover-sensor
    curl 'https://api.groundcover.com/api/k8s/v3/clusters/list' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'X-Backend-Id: <YOUR_BACKEND_ID>' \
      --data-raw '{"sources":[]}'
    Use groundcover to get 5 logs from the workload news-service from the past 15 minutes.
    Use groundcover to get the spec of the chat-app deployment.
    Use groundcover to show the top 5 workloads by P95 latency.
    I got an alert for this critical groundcover issue. Can you investigate it?
    https://app.groundcover.com/monitors/issues?...
    I got multiple alerts in the staging-env namespace. Can you help me look into them using groundcover?
    Use groundcover to debug this code. For each test, print relevant logs with test_id, and dive into any error logs.
    Please deploy this service and verify everything works using groundcover.
    POST /api/k8s/v2/namespaces/list

    ✅

    Terraform Support

    ❌

    ✅

    Issues Based on Traces/Logs/Metrics

    Limited

    Full support

    Revocation effect

    Data stops flowing immediately

    API calls fail

    requires configuration
    Requirements

    API Key

    The groundcover tenant API KEY is required for configuring the data source connection.

    You can obtain your API key from the API Keys page in the groundcover console.

    For this example we will use the key API-KEY-VALUE

    Setup

    Prometheus

    Grafana Data Source Configuration

    Configure Grafana prometheus Data Source by following these steps logged in as Grafana Admin.

    1. Connections > Data Sources > + Add new data source

    2. Pick Prometheus

      1. Name: groundcover-prometheus

      2. Prometheus server URL: https://app.groundcover.com/api/prometheus

      3. Customer HTTP Headers > Add Header

        1. Header: authorization

        2. Value: Bearer API-KEY-VALUE

      4. Performance

        1. Prometheus type: Prometheus

        2. Prometheus version: > 2.50.x

    3. Click "Save & test"

    "Successfully queried the Prometheus API" means the integration was configured correctly.

    Legacy Configuration

    The following configurations are deprecated but may still be in use in older setups.

    Datasources API Key

    The legacy datasources API key can be obtained by running: groundcover auth get-datasources-api-key

    ClickHouse

    ClickHouse datasource integration is deprecated and no longer supported for new installations.

    Configure Grafana ClickHouse Data Source by following these steps logged in as Grafana Admin.

    1. Connections > Data Sources > + Add new data source

    2. Pick ClickHouse

      1. Name: groundcover-clickhouse

      2. Server

        1. Server address: ds.groundcover.com

        2. Server port: 443

        3. Protocol: HTTP

        4. Secure Connection: ON

      3. HTTP Headers

        1. Forward Grafana HTTP Headers: ON

      4. Credentials

        1. Username: Leave empty

        2. Password: API-KEY-VALUE

      5. Additional Properties

        1. Default database: groundcover

    3. Click "Save & test"

    "Data source is working" means the integration was configured correctly.

    inCloud Managed
    Prometheus
    On-Prem
    1040
    — Too Many Connections
  • 1064 — Syntax Error\

  • Namespace

    OpenShift

    supported

    Rancher

    supported

    Self-managed

    supported

    minikube

    supported

    kind

    supported

    Rancher Desktop

    supported

    k0s

    supported

    k3s

    supported

    k3d

    supported

    microk8s

    supported

    AWS Fargate

    not supported

    Docker-desktop

    not supported

    Secret

  • PVC

  • EKS

    supported

    AKS

    supported

    GKE

    supported

    OKE

    supported

    eBPF sensor
    Architecture Section
    architecture

    Instructions for VS Code

    macOS

    sudo systemsetup -gettimezone

    Linux

    timedatectl grep "Time zone"

    Windows PowerShell

    Get-TimeZone

    node
    groundcover API keys
    find it below
    Instructions for Claude Desktop
    Instructions for Claude Web
    Instructions for Cursor
    Instructions for Windsurf

    Amazon Linux

    All off the shelf AMIs

    Google COS

    All off the shelf AMIs

    Azure Linux

    All off the shelf AMIs

    Talos

    1.7.3+

    Debian

    11+

    RedHat Enterprise Linux

    8.2+

    Ubuntu

    20.10+

    CentOS

    7.3+

    Fedora

    31+

    BottlerocketOS

    1.10+

    safe by design!
    CO:RE
    distributions
    k8s environment
    collected logs
    integrations

    Paste your Zapier Catch Hook URL (you’ll get this in Step 2 below).

  • Save the integration.

  • Event: Catch Hook

  • Copy the Webhook URL (e.g. https://hooks.zapier.com/hooks/catch/...) – you'll use this in groundcover.

  • Configure the email:
    • To: your email address

    • Subject:

    • Body:

    Zapier

    Authorization

    Yes

    Bearer token with your API key

    X-Backend-Id

    Yes

    Your backend identifier

    Content-Type

    Yes

    Must be application/json

    Accept

    Yes

    Must be application/json

    sources

    Array

    No

    Filter by data sources (empty array for all sources)

    start

    String

    Yes

    Start timestamp in ISO 8601 format (UTC)

    end

    String

    Yes

    namespaces

    Array

    Array of namespace names or namespace objects

    End timestamp in ISO 8601 format (UTC)

    these docs
    drop_on_error
    log pipeline example
    Log Parsing with OpenTelemetry Pipelines
    Log Parsing with OpenTelemetry Pipelines
    Authentication

    This endpoint requires API Key authentication via the Authorization header.

    Headers

    Request Body

    Parameter
    Type
    Required
    Description

    name

    string

    Yes

    Exact name of the ingestion key to delete

    Examples

    Delete by Name

    Response

    The endpoint returns an empty response body with HTTP 200 status code when the key is deleted successfully.

    Important Warnings

    🚨 PERMANENT DELETION: This operation permanently deletes the ingestion key and cannot be undone.

    ⚠️ Immediate Impact: Any services using this key will:

    • Receive 403 PERMISSION_DENIED errors

    • Stop sending data to groundcover immediately

    • Lose access to remote configuration (for sensor keys)

    Verification

    To verify the key was deleted, use the List Ingestion Keys endpoint:

    This should return an empty array [] if the key was successfully deleted.

    Safe Deletion Workflow

    1. Identify the key to delete using the List endpoint

    2. Update integrations to use a different key first

    3. Test integrations work with the new key

    4. Delete the old key using this endpoint

    5. Verify deletion using the List endpoint

    Best Practices

    • Always have a replacement key ready before deleting production keys

    • Test your rollover plan in staging environments first

    • Update all services using the key before deletion

    • Use descriptive names to avoid accidental deletion of wrong keys

    • Consider key rotation instead of permanent deletion for security incidents

    Related Documentation

    For comprehensive information about ingestion keys management, see:

    • Ingestion Keys

    • Head out to the integrations section: Settings -> Integrations, to create a new Webhook

    • Start by giving your Webhook integration a name. This name will be used below in the provider block sample .

    • Set the Webhook URL to the url you copied from field (2)

    • Keep the HTTP method as POST

  • Create a Workflow Go to Monitors --> Workflows --> Create Workflow, and paste the YAML configuration provided below.

  • Configure the provider Blocks (There are two of them) In the provider block, replace {{ providers.your-teams-integration-name }} with your actual Webhook integration name (the one you created in step 3) For example, if you named your integration test-ms-teams, the config reference would be: {{ providers.test-ms-teams }}

  • The following example shows a pre-configured MS Teams workflow template. You can easily modify workflows to support different formats based on the MS Teams workflow schema.

    Sample code for your groundcover workflow:

    Install the SDK

    Initialize the SDK

    Initialization

    Configuration Parameters

    apiKey

    A dedicated Ingestion Key of type RUM (Settings -> Access -> Ingestion Keys)

    dsn

    Your public groundcover endpoint in the format of https://example.platform.grcv.io , where example.platform.grcv.io is your ingress.site installation value.

    cluster

    Identifier for your cluster; helps filter RUM data by specific cluster.

    environment

    Environment label (e.g., production, staging) used for filtering data.

    appId

    Custom application identifier set by you; useful for filtering and segmenting data on a single application level later.

    Advanced Configuration

    You can customize SDK behavior (event sampling, data masking, enabled events). The following properties are customizable:

    You can pass the values by calling the init function:

    Or via the updateConfig function:

    Identify Users

    Link RUM data to specific users:

    Send Custom Events

    Instrument key user interactions:

    Capture Exceptions

    Manually track caught errors:

    Enterprise plan
    browser SDK

    port-forward groundcover's VictoriaMetrics service object

    • Run the vmbackup utility, in this example we'll set the destination to an AWS S3 bucket, but more providers are supported

    vmbackup automatically uses incremental backup strategy if the destination contains an existing backup

    Restoring from backup

    • Scale down VictoriaMetrics statefulSet (VictoriaMetrics must be offline during restorations)

    • Get the VictoriaMetrics PVC name

    • Create the following Kubernetes Job manifest vm-restore.yaml

    Make sure you replace {VICTORIA METRICS PVC NAME} with the fetched pvc name

    • Deploy the job and wait for completion

    • Once completed, scale up groundcover's VictoriaMetrics instance

    VictoriaMetrics
    vmbackup
    vmrestore
    Install vmutils
    Using Vector transforms

    groundcover uses Vector as an aggregator and transformer deployed into each monitored environment. It is an open-source, highly performant service, capable of supporting many manipulations on the data flowing into groundcover's backend.

    Pipelines are configured using Vector transforms, where each transform defines one step in the pipeline. There are many types of transforms, and all of them can be natively used within the groundcover deployment to achieve full flexibility.

    The most common transform is the remap transform - allowing to write arbitrary logic using Vector's VRL syntax. There are many pre-defined functions to parse, filter and enrich data, and we recommend experimenting with it to fit your needs.

    For testing out VRL before deployment we recommend the VRL playground.

    Deploying groundcover with Pipelines

    groundcover's deployment supports adding a list of transforms for logs and traces independently. These steps will be automatically appended into the default pipeline, eliminating the need to understand the inner workings of grouncover's setup. Instead, you only need to configure the steps you wish to execute, and after redeploying groundcover you will see them take effect immediately.

    Each step requires two attributes:

    • name: must be unique across all pipelines

    • transform: the transform itself, passed as-is to Vector.

    Logs Pipeline

    The following is a template for a logs pipeline with two remap stages:

    View an example with real inputs ->

    Traces Pipeline

    The following is a template for a traces pipeline with one filter stage:

    View an example with real inputs ->

    Custom Logs to Events Pipeline

    Logs to Events pipelines allow creating of custom events from incoming logs. Unlike the logs and traces pipelines, they do not affect the original logs, and are meant to create parallel, distinguished events for future analytics.

    The following is a template for a custom event pipeline with a filter stage and an extraction step.

    The inputs fields below will connect the events pipeline with the default incoming logs pipelines.

    View an example with real inputs

    Vector
    this page
    {
      "mcpServers": {
        "groundcover": {
          "command": "npx",
          "args": [
            "-y",
            "[email protected]",
            "https://mcp.groundcover.com/api/mcp",
            "54278",
            "--header",
            "X-Timezone:<IANA_TIMEZONE>",
            "--header",
            "X-Tenant-Uuid:<TENANT_UUID>"
          ]
        }
      }
    }
    claude mcp add groundcover npx -- -y [email protected] https://mcp.groundcover.com/api/mcp 54278 --header X-Timezone:<IANA_TIMEZONE> --header X-Tenant-UUID:<TENANT_UUID>
    {
      "mcpServers": {
        "groundcover": {
          "command": "npx",
          "args": [
            "-y",
            "mcp-remote",
            "https://mcp.groundcover.com/api/mcp",
            "--header", "Authorization:${AUTH_HEADER}",
            "--header", "X-Timezone:${TIMEZONE}"
          ],
          "env": {
            "AUTH_HEADER": "Bearer <your_token>",
            "TIMEZONE": "<your_timezone>"
          }
        }
      }
    }
    "--header", "X-Backend-Id:<BACKEND_ID>"
    $ ls -la /sys/kernel/btf/vmlinux
    
    - r--r--r--. 1 root root 3541561 Jun 2 18:16 /sys/kernel/btf/vmlinux
    🚨 New groundcover Alert 🚨
    🔔 Alert Title: {{alert_name}}
    💼 Severity: {{severity}}
    
    🔗 Links:
    - 🧹 Issue: {{issue_url}}
    - 📈 Monitor: {{monitor_url}}
    - 🔕 Silence: {{silence_url}}
    workflow:
      id: emails
      description: Sends alerts to Zapier webhook (for email)
      triggers:
      - type: alert
        name: emails
      consts:
        severity: keep.dictget( {{ alert.annotations }}, "_gc_severity", "info")
        title: keep.dictget( {{ alert.annotations }}, "_gc_issue_header", "{{ alert.alertname }}")
        issue_url: https://app.groundcover.com/monitors/issues?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.fingerprint }}
        monitor_url: https://app.groundcover.com/monitors?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.labels._gc_monitor_id }}
        silence_url: https://app.groundcover.com/monitors/create-silence?keep.replace(keep.join({{ consts.redacted_labels }}, "&", "matcher_"), " ", "+")
        redacted_labels: keep.dict_pop({{ alert.labels }}, "_gc_monitor_id", "_gc_monitor_name", "_gc_severity", "backend_id", "grafana_folder", "_gc_issue_header")
      actions:
      - name: <<THE_NAME_OF_YOUR_WORFLOW>>
        provider:
          config: "{{ providers.<<THE_NAME_OF_YOUR_INTEGRATION>> }}"
          type: webhook
          with:
            body:
              alert_name: "{{ consts.title }}"
              severity: "{{ consts.severity }}"
              issue_url: "{{ consts.issue_url }}"
              monitor_url: "{{ consts.monitor_url }}"
              silence_url: "{{ consts.silence_url }}"
    curl 'https://api.groundcover.com/api/k8s/v2/namespaces/list' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'content-type: application/json' \
      -H 'X-Backend-Id: <YOUR_BACKEND_ID>' \
      --data-raw '{"sources":[],"start":"2025-01-24T06:00:00.000Z","end":"2025-01-24T08:00:00.000Z"}'
    {
      "namespaces": [
        "groundcover",
        "monitoring",
        "kube-system",
        "default"
      ]
    }
    # Get current time and subtract 24 hours for start time
    start_time=$(date -u -v-24H '+%Y-%m-%dT%H:%M:%S.000Z')
    end_time=$(date -u '+%Y-%m-%dT%H:%M:%S.000Z')
    
    curl 'https://api.groundcover.com/api/k8s/v2/namespaces/list' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'content-type: application/json' \
      -H 'X-Backend-Id: <YOUR_BACKEND_ID>' \
      --data-raw "{\"sources\":[],\"start\":\"$start_time\",\"end\":\"$end_time\"}"
    {"name":"my-log","count":5} 
    .string_attributes.name --> "my-log"
    .float_attributes.count --> 5
    .parsed, .err = parse_json("{\"Hello\": \"World!\"}")
    if err == null {
      // do something with .parsed
    }
    parsed = parse_json!("{\"Hello\": \"World!\"}")
    Authorization: Bearer <YOUR_API_KEY>
    Content-Type: application/json
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/pipelines/logs/config' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --data '{
        "ottlRules": [
          {
            "ruleName": "test_log_pattern",
            "conditions": [
              "workload == \"test-app\" or container_name == \"test-container\""
            ],
            "statements": [
              "set(cache, ExtractGrokPatterns(body, \"^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:message}\"))",
              "merge_maps(attributes, cache, \"insert\")",
              "set(attributes[\"parsed\"], true)"
            ],
            "statementsErrorMode": "skip",
            "conditionLogicOperator": "or"
          },
          {
            "ruleName": "json_parsing_test",
            "conditions": [
              "format == \"JSON\""
            ],
            "statements": [
              "set(parsed_json, ParseJSON(body))",
              "merge_maps(attributes, parsed_json, \"insert\")"
            ],
            "statementsErrorMode": "skip",
            "conditionLogicOperator": "and"
          }
        ]
      }'
    {
      "uuid": "59804867-6211-48ed-b34a-1fc33827aca6",
      "created_by": "itamar",
      "created_timestamp": "2025-08-31T13:33:27.364525Z",
      "value": "ottlRules:\n  - ruleName: test_log_pattern\n    conditions:\n      - workload == \"test-app\" or container_name == \"test-container\"\n    statements:\n      - set(cache, ExtractGrokPatterns(body, \"^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:message}\"))\n      - merge_maps(attributes, cache, \"insert\")\n      - set(attributes[\"parsed\"], true)\n    statementsErrorMode: skip\n    conditionLogicOperator: or"
    }
    curl -L \
      --url 'https://api.groundcover.com/api/pipelines/logs/config' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Accept: */*' > pipeline-backup.json
    curl -L \
      --url 'https://api.groundcover.com/api/pipelines/logs/config' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Accept: */*'
    Authorization: Bearer <YOUR_API_KEY>
    Accept: */*
    curl -L \
      --url 'https://api.groundcover.com/api/pipelines/logs/config' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Accept: */*'
    {
      "ottlRules": [
        {
          "ruleName": "nginx_access_logs",
          "conditions": [
            "workload == \"nginx\" or container_name == \"nginx\""
          ],
          "statements": [
            "set(cache, ExtractGrokPatterns(body, \"^%{IPORHOST:remote_ip} - %{DATA:remote_user} \\[%{HTTPDATE:timestamp}\\] \\\"%{WORD:method} %{DATA:path} HTTP/%{NUMBER:http_version}\\\" %{INT:status} %{INT:body_bytes}\"))",
            "merge_maps(attributes, cache, \"insert\")"
          ],
          "statementsErrorMode": "skip",
          "conditionLogicOperator": "or"
        },
        {
          "ruleName": "json_log_parsing",
          "conditions": [
            "format == \"JSON\""
          ],
          "statements": [
            "set(parsed_json, ParseJSON(body))",
            "merge_maps(attributes, parsed_json, \"insert\")"
          ],
          "statementsErrorMode": "skip",
          "conditionLogicOperator": "and"
        },
        {
          "ruleName": "error_log_enrichment",
          "conditions": [
            "level == \"error\" or level == \"ERROR\""
          ],
          "statements": [
            "set(attributes[\"severity\"], \"high\")",
            "set(attributes[\"needs_attention\"], true)"
          ],
          "statementsErrorMode": "skip",
          "conditionLogicOperator": "or"
        }
      ]
    }
    Authorization: Bearer <YOUR_API_KEY>
    Content-Type: application/json
    {
      "name": "string"
    }
    curl -L \
      --request DELETE \
      --url 'https://api.groundcover.com/api/rbac/ingestion-keys/delete' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --data '{
        "name": "old-test-key"
      }'
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/rbac/ingestion-keys/list' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --data '{
        "name": "old-test-key"
      }'
    workflow:
      id: teams-webhook
      description: Sends an API to MS Teams alerts endpoint
      name: ms-teams-alerts-workflow
      triggers:
      - type: alert
        filters:
        - key: annotations.ms-teams-alerts-workflow
          value: enabled
      consts:
        silence_link: 'https://app.groundcover.com/monitors/create-silence?keep.replace(keep.join(keep.dict_pop({{ alert.labels }}, "_gc_monitor_id", "_gc_monitor_name", "_gc_severity", "backend_id", "grafana_folder"), "&", "matcher_"), " ", "+")'
        monitor_link: 'https://app.groundcover.com/monitors?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.labels._gc_monitor_id }}'
        title_link: 'https://app.groundcover.com/monitors/issues?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.fingerprint }}'
        description: keep.dictget( {{ alert.annotations }}, "_gc_description", '')
        redacted_labels: keep.join(keep.dict_pop({{alert.labels}}, "_gc_monitor_id", "_gc_monitor_name", "_gc_severity", "backend_id", "grafana_folder", "_gc_issue_header"), "-\n")
        title: keep.dictget( {{ alert.annotations }}, "_gc_issue_header", "{{ alert.alertname }}")
    
      actions:
      - if: '{{ alert.status }} == "firing"'
        name: teams-webhook-firing
        provider:
          config: ' {{ providers.your-teams-integration-name }} '
          type: webhook
          with:
            body:
              type: message
              attachments:
              - contentType: application/vnd.microsoft.card.adaptive
                content:
                  $schema: http://adaptivecards.io/schemas/adaptive-card.json
                  type: AdaptiveCard
                  version: "1.2"
                  body:
                  - type: TextBlock
                    text: "\U0001F6A8 Firing: {{ consts.title }}"
                    weight: bolder
                    size: large
                  - type: TextBlock
                    text: "[Investigate Issue]({{consts.title_link}})"
                    wrap: true
                  - type: TextBlock
                    text: "{{ consts.description }}"
                    wrap: true
                  - type: TextBlock
                    text: "[Silence]({{consts.silence_link}})"
                    wrap: true
                  - type: TextBlock
                    text: "[See monitor]({{consts.monitor_link}})"
                    wrap: true
                  - type: TextBlock
                    text: "{{ consts.redacted_labels }}"
                    wrap: true
      - if: '{{ alert.status }} != "firing"'
        name: teams-webhook-resolved
        provider:
          config: ' {{ providers.your-teams-integration-name }} '
          type: webhook
          with:
            body:
              type: message
              attachments:
              - contentType: application/vnd.microsoft.card.adaptive
                content:
                  $schema: http://adaptivecards.io/schemas/adaptive-card.json
                  type: AdaptiveCard
                  version: "1.2"
                  body:
                  - type: TextBlock
                    text: "\U0001F7E2 Resolved: {{ consts.title }}"
                    weight: bolder
                    size: large
                  - type: TextBlock
                    text: "[Investigate Issue]({{consts.title_link}})"
                    wrap: true
                  - type: TextBlock
                    text: "{{ consts.description }}"
                    wrap: true
                  - type: TextBlock
                    text: "[Silence]({{consts.silence_link}})"
                    wrap: true
                  - type: TextBlock
                    text: "[See monitor]({{consts.monitor_link}})"
                    wrap: true
                  - type: TextBlock
                    text: "{{ consts.redacted_labels }}"
                    wrap: true
      
    npm install @groundcover/browser
    # or
    yarn add @groundcover/browser
    import groundcover from '@groundcover/browser';
    
    groundcover.init({
      apiKey: 'your-ingestion-key',
      cluster: 'your-cluster',
      environment: 'production',
      dsn: 'your-dsn',
      appId: 'your-app-id',
    });
    export interface SDKOptions {
      batchSize: number;
      batchTimeout: number;
      eventSampleRate: number;
      sessionSampleRate: number;
      environment: string;
      debug: boolean;
      tracePropagationUrls: string[];
      beforeSend: (event: Event) => boolean;
      enabledEvents: Array<"dom" | "network" | "exceptions" | "logs" | "pageload" | "navigation" | "performance">;
      excludedUrls: [];
    }
    groundcover.init({
      apiKey: 'your-ingestion-key',
      cluster: 'your-cluster',
      environment: 'production',
      dsn: 'your-dsn',
      appId: 'your-app-id',
      options: {
        batchSize: 50,
        sessionSampleRate: 0.5, // 50% sessions sampled
        eventsSampleRate: 0.5,
      },
    });
    groundcover.updateConfig({
       batchSize: 20,
    });
    groundcover.identifyUser({
      id: 'user-id',
      email: '[email protected]',
    });
    groundcover.sendCustomEvent({
      event: 'PurchaseCompleted',
      attributes: { orderId: 1234, amount: 99.99 },
    });
    try {
      performAction();
    } catch (error) {
      groundcover.captureException(error);
    }
    kubectl get svc -n groundcover | grep "victoria-metrics"
    # Identify the victoria-metrics service object name
    kubectl port-forward svc/{victoria-metrics-service-object-name} \
    -n groundcover 8428:8428
    ./vmbackup -credsFilePath={aws credentials path} \
    -storageDataPath=</path/to/victoria-metrics-data> \
    -snapshot.createURL=http://localhost:8428/snapshot/create \
    -dst=s3://<bucket>/<path/to/backup>
    kubectl scale sts {release name}-victoria-metrics --replicas=0
    kubectl get pvc -n groundcover | grep victoria-metrics
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: vm-restore
      annotations:
        eks.amazonaws.com/role-arn: XXXXX # role with permissions to write to the bucket
    ---
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: vm-restore
    spec:
      ttlSecondsAfterFinished: 600
      template:
        spec:
          serviceAccountName: vm-restore
          restartPolicy: OnFailure
          volumes:
          - name: vmstorage-volume
            persistentVolumeClaim:
              claimName: "{VICTORIA METRICS PVC NAME}"
          containers:
          - name: vm-restore
            image: victoriametrics/vmrestore
            imagePullPolicy: IfNotPresent
            volumeMounts:
            - mountPath: /storage
              name: vmstorage-volume
            command:
            - /bin/sh
            - -c
            - /vmrestore-prod -src=s3://<bucket>/<path/to/backup> -storageDataPath=/storage
    kubectl apply -f vm-restore.yaml -n groundcover
    kubectl scale sts {release name}-victoria-metrics --replicas=1
    vector:
      logsPipeline:
        extraSteps: 
        - name: stepA
          transform:
            type: remap
            source: |-
              ...
        - name: stepB
          transform:
            type: remap
            source: |-
              ...
    vector:
      tracesPipeline:
        extraSteps: 
        - name: stepA
          transform:
            type: filter
            condition: |- 
               ...
    vector:
      eventsPipelines:
        my_event_name:
          inputs:
            - logs_from_logs
            - json_logs
          extraSteps:
            - name: filter_step
              transform:
                type: filter
                condition: |-
                  ...
            - name: extraction_step
              transform:
                type: remap
                source: |-
                  ...
              
    Prerequisites
    • A groundcover account with permissions to create/edit Dashboards

    • A Terraform environment (groundcover provider >v1.1.1)

    • The groundcover Terraform provider configured with your API credentials

    See also: groundcover Terraform provider reference for provider configuration and authentication details.


    1) Creating a Dashboard via Terraform

    1.1) Create a dashboard directly from the UI

    In order to create a dashboard using Terraform you first need to create a dashboard manually in order to export it in a Terraform format.

    See Creating dashboards to learn more.

    1.2) Export the dashboard in Terraform format

    You can export a Dashboard into as a Terraform resource:

    1. Open the Dashboard.

    2. Click Actions → Export.

    3. Download or copy the Terraform tab’s content and paste it into your .tf file (see placeholder above).

    1.3) Add the dashboard resource to your Terraform configuration

    The example below is a placeholder, paste your generated snippet or hand‑write your own.

    After saving this file as main.tf along with the provider details, type:


    2) Managing existing provisioned Dashboard

    2.1) "Provisioned" badge for IaC‑managed Dashboards

    Dashboards added via Terraform are marked as Provisioned in the UI so you can quickly distinguish IaC‑managed Dashboards from manually created ones, both from the Dashboard List and inside the Dashboard itself.

    2.2) Edit behavior for Provisioned Dashboards

    Provisioned Dashboards are read‑only by default to protect the source of truth in your Terraform code.

    • To make a quick change, click Unlock dashboard. This allows editing directly in the UI, all changes are automatically saved as always.\

    • Important: Any changes can be overwritten the next time your provisioner runs terraform apply.

    • Safer alternative: Duplicate the Dashboard and edit the copy, then migrate those changes back into code.

    2.3) Editing dashboards via Terraform

    Changing the resource and reapplying Terraform willupdate the Dashboard in groundcover.

    Deleting the resource from your code (and applying) will delete it from groundcover.

    See more examples on our Github repo.


    3) Importing existing Dashboards into Terraform

    Already have a Dashboard in groundcover? Bring it under Terraform management without recreating it:

    After importing, run terraform plan to view the state and align your config with what exists.


    Reference

    • Creating dashboars – how to build widgets and layouts in the UI

    • groundcover Terraform provider documentation

    • groundcover Terraform provider Github repo – resource schema, arguments, and examples

    Send Real‑User‑Monitoring events using JS snippet embedded in web pages

    Third Party

    Integrate 3rd-party data sources that push data (e.g. OpenTelemtry, AWS Firehose, FluentBit, etc.)

    *Only the Sensor has limited read capability in order to support pulling remote configuration such as OTTL parsing rules applied from the UI. RUM and Third Party have write-only configurations.


    Creating an Ingestion Key

    It is recommended to create a dedicated Ingestion Key for every data source, so that they can be managed and rotated appropriately, minimize exposure or risk, and allow groundcover to identify the datasource of all the ingested data.

    1. Open Settings → Access → Ingestion Keys and click Create key.

    2. Give the key a clear, descriptive Name (for example k8s-prod‑eu‑central‑1).

    3. Select the Type that matches your integration.

    4. Click Click & Copy Key.

      1. Unlike API Keys, Ingestion Keys stay visible on the page. Treat every reveal as sensitive and follow the same secret‑handling practices.

    5. Store they Key securely, and continue to integrate your data source.


    Using an Ingestion Key

    Kubernetes sensor example

    OpenTelemetry integration (OTel/HTTP) example


    Viewing keys

    The Ingestion Keys table lets you:

    • Reveal the key at any time.

    • See who created the key and when.

    • Sort by Type or Creator to locate specific credentials quickly.


    Revoking a key

    Click ⋮ → Revoke next to the key. Revocation permanently deletes the key, unlike API Keys which only disables it:

    • The key will disappear from the list.

    • Any service using it will receive 403 / PERMISSION_DENIED and will not be able to continue to send data or pull latest configurations.

    This operation cannot be undone — create a new key and update your deployments if you need access again.


    Ingestion Keys vs. API Keys

    Ingestion Key

    API Key

    Primary purpose

    Write data (ingest)

    Read data / manage resources via REST

    Permissions capabilities

    Write‑only + optional remote‑config read

    Mirrors service‑account RBAC

    Visibility after creation

    Always revealable

    Shown once only

    Typical lifetime

    Tied to integration lifecycle

    Rotated for CI/CD automations


    Best Practices

    • One key per integration – simplifies rotation and blast radius.

    • Store securely – AWS Secrets Manager, GCP Secret Manager, HashiCorp Vault, Kubernetes Secrets.

    • Rotate regularly – create a new key, roll it out, then revoke the old one.

    • Monitor for 403 errors – a spike usually means a revoked or expired key.


    Sensor*

    Install the eBPF sensor on Kubernetes or Hosts/VMs

    RUM

    incident.io will create your configuration now
    from which you will need to copy the following items for the Webhook integration\
  • Set Up the Webhook in groundcover

    • Head out to the integrations section: Settings -> Integrations, to create a new Webhook

    • Start by giving your Webhook integration a name. This name will be used below in the provider block sample .

    • Set the Webhook URL to the url you copied from field (1)

    • Keep the HTTP method as POST

    • Under headers add Authorization, and paste the "Bearer <token>" copied from field (2).

  • Create a Workflow Go to Monitors --> Workflows --> Create Workflow, and paste the YAML configuration provided below. Note: The body section is a dictionary of keys that will be sent as a JSON payload to the incident.io platform

  • Configure the provider Block In the provider block, replace {{ providers.your-incident-io-integration-name }} with your actual Webhook integration name (the one you created in step 4) For example, if you named your integration test-incidentio, the config reference would be: {{ providers.test-incidentio }}

  • Required Parameters for Creating an alert When triggering an alert, the following keys are required:

    1. title - Alert title that can be pulled from groundcover as seen in the example

    2. status - One of "firing" or "resolved" that can also be pulled from groundcover as the example shows.

  • You can include additional parameters for richer context (optional):

    1. description

    2. deduplication_key - unique attribute used to group identical alerts, groundcover provides this through the fingerprint attribute

    3. metadata - Any additional metadata that you've configured within your monitor in groundcover. Note that these set should actively reflect your monitor definition in groundcover

  • The attributes shown in the yaml block of the metadata section below are an example only! Alert labels can only be attributes used in the group by section of the actual monitor

    Example code for your groundcover workflow:

    incident.io
    incident.io

    To locate a channel ID, open the channel in Slack, click the channel name at the top, and scroll to the About section. The channel ID is shown at the bottom of this section.

    \

    1. The channel name should be included in the monitor’s Metadata Labels, or you can fall back to a default. See the channel_id attribute in the workflow example.\

    2. Finally, replace the integration name in {{ providers.slack-routing-webhook }} with the actual name of the Webhook integration you created.

    webhook for a Slack App with Bot Tokens
    Supported Resources
    • groundcover_policy – Defines RBAC policies (roles and optional data scope filters) Role-Based Access Control (RBAC)

    • groundcover_serviceaccount – Creates service accounts using attaches policies. Service Accounts

    • groundcover_apikey – Creates API keys for service accounts. API Keys

    • groundcover_monitor – Defines alerting rules and monitors.

    • groundcover_logspipeline - Defines Logs Pipeline configurations

    • groundcover_ingestionkey - Creates Ingestion keys.

    • groundcover_dashboard - Define Dashboards.

    Installation and Setup

    Requirements

    • Terraform ≥ 1.0 (Check required_version if specified in main.tf)

    • Go >= 1.21 (to build the provider plugin)

    • groundcover Account and API Key.

    Install the Provider

    Run terraform init to install the provider.

    Configure the Provider

    Arguments

    • api_key (String, Required, Sensitive): Your groundcover API key. It is strongly recommended to configure this using the TF_VAR_groundcover_api_key environment variable rather than hardcoding it.

    • base_url (String, Optional): The base URL for the groundcover API. Defaults to https://api.groundcover.com if not specified.

    • backend_id (String, Required): Your Backend ID can be found in the API Keys screen in the groundcover UI (Under Settings -> Access):

    Examples

    For full examples of all existing resources, see: https://github.com/groundcover-com/terraform-provider-groundcover/tree/main/examples/resources

    Creating a Read-Only Service Account and API Key

    https://registry.terraform.io/providers/groundcover-com/groundcover/latest
    https://github.com/groundcover-com/terraform-provider-groundcover
    Saving a view
    1. Configure the page until it looks exactly right—filters, columns, panels, etc.

    2. Click ➕Save View.

    3. Give the view a clear Name.

    4. Hit Save. The view is now listed and available to everyone in the project.

    Scope – Saved Views are per‑page. A Logs view appears only in Logs; a Traces view only in Traces.


    What a view stores

    Common to all pages

    Category
    Details

    Filters & facets

    All query filters plus facet open/closed state

    Columns

    Chosen columns, and their order, sort, width

    Filter Panel & Added Facets

    Filter panel open/closed

    Facets added/removed

    Page‑specific additions

    Page
    Extra properties saved

    Logs

    • logs / patterns

    • textWrap

    • Insight show / hide

    Traces

    • traces / span

    • table / drilldown

    • textWrap

    API Catalog

    • protocol

    • Kafka role: Fetcher / Producer

    Events

    • textWrap


    Updating a view

    The Update View button appears only when you are the creator of the view. Click it to overwrite the view with your latest changes.

    Underneath every View you can see which user created it.


    Managing views (row operations)

    Action
    Who can do it?

    Edit / Rename

    Creator

    Delete

    Creator

    Star / Unstar

    Any user for themselves


    Searching, Sorting, and Filtering the list

    Searching the Views will look up based on View names and the creators.

    The default sorting pins the favorites views at the top, and the rest of the views below. Each group of views is sorted from A→Z.

    In addition, 3 filtering options are available:

    1. All Views - The entire workspace's views for a specific page

    2. My Favorites – The favorite views of the user for a specific page

    3. Created By Me - The views created by the user

    Overview

    Real User Monitoring (RUM) extends groundcover’s observability platform to the client side, providing visibility into actual user interactions and front-end performance. It tracks key aspects of your web application as experienced by real users, then correlates them with backend metrics, logs, and traces for a full-stack view of your system.

    Understand user experience - capture every interaction, page load, and performance metric from the end-user perspective to pinpoint front-end issues in real time.

    Resolve issues faster - seamlessly tie front-end events to backend traces and logs in one platform, enabling end-to-end troubleshooting of user journeys.

    Privacy first - groundcover’s Bring Your Own Cloud (BYOC) model ensures all RUM data stays in your own cloud environment. Sensitive user data never leaves your infrastructure, ensuring privacy and compliance without sacrificing insight.

    Collection

    groundcover RUM collects a wide range of data from users’ browsers through a lightweight JavaScript SDK. Once integrated into your web application, the SDK automatically gathers and sends the following telemetry from each user session to the groundcover platform:

    • Network requests: Every HTTP request initiated by the browser (such as API calls) is captured as a trace. Each client-side request can be linked with its corresponding server-side trace, giving you a complete picture of the request from the user’s click to the backend response.

    • Front-end logs: Client-side log messages (e.g., console.log outputs, warnings, and errors) are collected and forwarded to groundcover’s log management. This ensures that browser logs are stored alongside your application’s server logs for unified analysis.

    • Exceptions: Uncaught JavaScript exceptions and errors are automatically captured with full stack traces and contextual data (browser type, URL, etc.). These front-end errors become part groundcover monitors, letting you quickly identify and debug issues in the user’s environment.

    • Performance metrics (Core Web Vitals): Key performance indicators like page load time and Core Web Vitals like Largest Contentful Paint, Interaction to Next Paint, and Cumulative Layout Shift are measured for each page view. groundcover RUM records these metrics to help you track real-world performance and detect slowdowns affecting users.

    • User interactions: RUM tracks user interactions such as clicks, keydown, and navigation events. By recording which elements users interact with and when, groundcover helps you reconstruct user flows and understand the sequence of actions leading up to any issue or performance problem.

    • Custom events: You can instrument your application to send custom events via the RUM SDK. This allows you to capture domain-specific actions or business events (for example, a checkout completion or a specific UI gesture) with associated metadata, providing deeper insight into user behavior beyond automatic captures.

    All collected data is streamed securely to your groundcover deployment. Because groundcover runs in your environment, RUM data (including potentially sensitive details from user sessions) is stored in the observability backend within your own cloud. From there, it is aggregated and indexed just like other telemetry, ready to be searched and analyzed in the groundcover UI.

    Full-Stack Visibility

    One of the core advantages of groundcover RUM is its native integration with backend observability data. Every front-end trace, log, or event captured via RUM is contextualized alongside server-side data:

    • Trace correlation: Client-side traces (from browser network requests) are automatically correlated with server-side traces captured by groundcover’s eBPF-based instrumentation. This means when a user triggers an API call, you can see the complete distributed trace that spans the browser and the backend services, all in one view.

    • Unified logging: Front-end log entries and error reports are ingested into the same backend as your server-side logs. In the groundcover Log Explorer, you can filter and search across logs from both client and server, using common fields (like timestamp, session ID, or trace ID) to connect events.

    • End-to-end troubleshooting: With full-stack data in one platform, you can pivot easily between a user’s session replay, the front-end events, and the backend metrics/traces involved. This end-to-end context significantly reduces the time to isolate whether an issue originated in the frontend (browser/UI) or the backend (services/infrastructure), helping teams pinpoint problems faster across the entire stack.

    By bridging the gap between the user’s browser and your cloud infrastructure, groundcover’s RUM capability ensures that no part of the user journey is invisible to your monitoring. This holistic view is critical for optimizing user experience and rapidly resolving issues that span multiple layers of your application.

    Sessions Explorer

    Once RUM data is collected, it becomes available in the groundcover platform via the Sessions Explorer — a dedicated view for inspecting and troubleshooting user sessions. The Sessions Explorer allows you to navigate through user journeys and understand how your users experience your application.

    Clicking on any session opens the Session View, where you can inspect a full timeline of the user’s experience. This view shows every key event captured during the session - including clicks, navigations, network requests, logs, custom events, and errors.

    Each event is displayed in sequence with full context like timestamps, URLs, and stack traces. The Session View helps you understand exactly what the user did and what the system reported at each step, making it easier to trace issues and user flows.

    Enterprise plan
    instructions on how to connect RUM

    Supported Technologies

    groundcover will work out-of-the-box on all protocols, encryption libraries and runtimes below - generating traces and metrics with zero code changes.

    We're growing our coverage all the time. Cant find what you're looking for? let us know over Slack.

    Supported protocols

    Protocol
    Status
    Comments

    Supported encryption libraries and runtimes

    groundcover seamlessly supports APM for encrypted communication - as long as it's listed below.

    Encryption Library/Runtime
    Status
    Comments

    Encryption is unsupported for binaries which have been compiled without debug symbols ("stripped"). Known cases:

    • Crossplane

    List Monitors

    Get a list of all configured monitors in the system with their identifiers, titles, and types.

    Endpoint

    POST /api/monitors/list

    Authentication

    This endpoint requires API Key authentication via the Authorization header.

    Headers

    Header
    Required
    Description

    Request Body

    The request body supports filtering by sources:

    Parameters

    Parameter
    Type
    Required
    Description

    Response

    Response Schema

    Field Descriptions

    Field
    Type
    Description

    Monitor Types

    Type
    Description

    Examples

    Basic Request

    Get all monitors:

    Response Example

    Connect Kubernetes clusters

    Get up and running in minutes in Kubernetes

    Before installing groundcover in Kubernetes, please make sure your cluster meets the requirements.

    After ensuring your cluster meets the requirements, complete the login and workspace setup, then choose your preferred installation method:

    • UI

    • CLI

    Coverage policy covers all nodes excluding control plane and fargate. .

    Creating helm values file

    Sensor deployment requires installation values similar to these stored in a values.yaml file

    {inCloud_Site} is your unique backend identifier, which is needed for the sensors to send data to your backend. This value will be sent to you by the groundcover team after inCloud Managed is set up.

    Installing using CLI

    Use groundcover CLI to automate the installation process. The main advantages of using this installation method are:

    • Auto-detection of cluster incompatibility issues

    • Tolerations setup automation

    • Tuning of resources according to cluster size

    • Supports passing helm overrides

    Read more .

    The CLI will automatically use existing ingestion keys or provision a new one if none exist

    Installing groundcover CLI

    Deploying groundcover using the CLI

    To upgrade groundcover to the latest version, simply re-run the groundcover deploy command with your desired overrides (such as -f values.yaml). The CLI will automatically detect and apply the latest available version during the deployment process.

    Installing using Helm

    Step 1 - Install groundcover CLI

    Step 2 - Generate Installation Key

    For more details about ingestion keys, refer to our .

    Step 3 - Add Sensor Ingestion Key to Values File

    Add the recently created sensor key to the values.yaml file provided by the groundcover team

    Step 4 - Add Helm Repository

    Step 5 - Install groundcover

    Initial installation:

    Upgrade groundcover:

    Installing using ArgoCD

    For CI/CD deployments using ArgoCD, refer to our .

    What can you do next?

    Check out our

    Uninstalling

    CLI

    Helm

    Creating dashboards

    Note: Only users with Write or Admin permissions can create and edit dashboards.

    How to create a new Dashboard in groundcover?

    1. Navigate to the Dashboard List and click on the Create New Dashboard button.

    2. Provide an indicative name for your dashboard and, optionally, a description.

    Steps to creating a Dashboard

    1. Create a new widget

    2. Choose a Widget Type

    3. Select a Widget Mode

    4. Build your query

    Optional:

    1. Add variables

    2. Apply variable(s) to the widget

    Create a new Widget

    Widgets can be added by clicking on the Create New Widget button.

    Choose a Widget Type

    Widgets are the main building blocks of dashboards. groundcover supports the following widget types:

    • Chart Widget: Visualize your data through various display types.

    • Textual Widget: Add context to your dashboard, such as headers or instructions for issue investigations.

    Since selecting a Textual Widget is the last step for this type of widget, the rest of this guide is relevant only to Chart Widgets.

    Select a Widget Mode

    • Metrics: Work with all your available metrics for advanced use cases and custom metrics.

    • Infra Metrics: Use expert-built, predefined queries for common infrastructure scenarios. Ideal for quick starts.

    • Logs: Query and visualize log data.

    • Traces: Query and visualize trace data similar to logs.

    Build your query

    Once the Widget Mode selected, build your query for the visualization.

    If you're unfamiliar with query building in groundcover, refer to the for full details on the different components.

    Choose a Display Type

    Type
    Configuration options
    Supported modes

    Variables

    Variables dynamically filter your entire dashboard or specific widgets with just one click. They consist of a key-value pair that you define once and reuse across multiple widgets.

    Our predefined variables cover most use cases, but if you’re missing an important one, let us know. Advanced variables are also on our roadmap.

    Adding a Variable

    1. Click on Add Variable.

    2. Select the variable key and values from the predefined list.

    3. Optionally, rename the variable or use the default name, then click Create.

    4. Once created, select the values to apply to this variable.

    Using a Variable

    Variables can be referenced in the Filter Bar of the Widget Creation Modal using their name.

    1. Create a variable (for example, select Clusters from the predefined list, and name it 'clusters')

    2. While creating or editing a Chart Widget, add a reference to the variable using a dollar sign in the filter bar, (for example, $clusters).

    3. The data will automatically filter by the variable's key with the selected values. If all values are selected, the filter will be followed by an asterisk (for example, cluster:*)

    Create Workflow

    Creates a new workflow for alert handling and notifications. Workflows define how alerts are processed and routed to various integrations like Slack, PagerDuty, webhooks, etc.

    Endpoint

    POST /api/workflows/create

    Authentication

    This endpoint requires API key authentication.

    Headers

    Header
    Value
    Description

    Request Body

    The request body should contain raw YAML defining the workflow configuration. The YAML structure should include:

    • id: Unique identifier for the workflow

    • description: Human-readable description

    • triggers: Array of trigger conditions

    Example Request

    Response

    Workflow YAML Structure

    Basic Structure

    Choosing Integration Providers

    To route alerts to a specific integration (Slack, PagerDuty, webhook, etc.), use the config field in the provider section to reference your configured integration by name.

    Example: Slack Integration

    Provider Configuration

    • config: '{{ providers.integration-name }}' - References a specific integration you've configured in groundcover

    • type - Specifies the integration type (slack, webhook, pagerduty, opsgenie)

    • Replace integration-name with your actual integration name.

    The integration name must match the name of an integration you've previously configured in your groundcover workspace.

    References

    For workflow examples and advanced configurations, see the .

    Log Patterns

    Log Patterns help you cut through log noise by grouping similar logs based on structure. Instead of digging through thousands of raw lines, you get a clean, high-level view of what’s actually going on

    Overview

    Log Patterns in groundcover help you make sense of massive log volumes by grouping logs with similar structure. Instead of showing every log line, the platform automatically extracts the static skeleton and replace dynamic values like timestamps, user IDs, or error codes with smart tokens.

    This lets you:

    • Cut through the noise

    • Spot recurring behaviors

    • Investigate anomalies faster

    How It Works

    groundcover automatically detects variable parts of each log line and replace them with placeholders to surface the repeating structure.

    Placeholder
    Description
    Example

    Requirements

    Log Patterns are collected directly on the sensor.

    Example

    Raw log:

    Patterned:

    Viewing Patterns

    1. Go to the Logs section.

    2. Switch from Records to Patterns using the toggle at the top.

    3. Patterns are grouped and sorted by frequency. You’ll see:

      • Log level (Error, Info, etc.)

    Value Distribution

    You can hover over any tag in a pattern to preview the distribution of values for that specific token. This feature provides a breakdown of sample values and their approximate frequency, based on sampled log data.

    This is especially useful when investigating common IPs, error codes, user identifiers, or other dynamic fields, helping you understand which values dominate or stand out without drilling into individual logs.

    For example, hovering over an <IP4> token will show a tooltip listing the most common IP addresses and their respective counts and percentages.

    Investigating Patterns

    • Click a pattern: Filters the Logs view to only show matching entries.

    • Use filters: Narrow things down by workload, level, format, or custom fields.

    • Suppress patterns: Hide noisy templates like health checks to stay focused on what matters.

    • Export patterns: Use the three-dot menu to copy the pattern for further analysis or alert creation.

    Role-Based Access Control (RBAC)

    This capability is only available to organizations subscribed to our Enterprise plan.

    Role-Based Access Control (RBAC) in groundcover gives you a flexible way to manage who can access certain features and data in the platform. By defining both default roles and policies, you ensure each team member only sees and does what their level of access permits. This approach strengthens security and simplifies onboarding, allowing administrators to confidently grant or limit access.

    Policies

    Policies are the foundational elements of groundcover’s RBAC. Each policy defines:

    1. A permission level – which actions the user can perform (Admin, Editor, or Viewer-like capabilities).

    2. A data scope – which clusters, environments, or namespaces the user can see.

    By assigning one or more policies to a user, you can precisely control both what they can do and where they can do it.

    Default Policies

    groundcover provides three default policies to simplify common use cases:

    1. Default Admin Policy

      • Permission: Admin

      • Data Scope: Full (no restrictions)

      • Behavior: Unlimited access to groundcover features and configurations.

    These default policies allow you to quickly onboard new users with typical Admin/Editor/Viewer capabilities. However, you can also create custom policies with narrower data scopes, if needed.

    Policy Structure

    A policy’s data scope can be defined in two modes: Simple or Advanced.

    1. Simple Mode

      • Uses AND logic across the specified conditions.

      • Applies the same scope to all entity types (e.g., logs, traces, events, workloads).

      • Example: “Cluster = Dev

    When creating or editing a policy, you select permission (Admin, Editor, or Viewer) and a data scope mode (Simple or Advanced).

    Multiple Policies

    A user can be associated with multiple policies. When that occurs:

    1. Permission Merging

      • The user’s final permission level is the highest among all assigned policies.

      • Example: If one policy grants Editor and another grants Viewer, the user is effectively an Editor overall.

    2. Data Scope Merging

    A user may be assigned a policy granting the Editor role with a data scope relevant to specific clusters, and simultaneously be assigned another policy granting the Viewer role with a different data scope. The user's effective access is determined by the highest role across all assigned policies and by the union (OR) of scopes.


    In summary:

    • Policies define both permission (Admin, Editor, or Viewer) and data scope (clusters, environments, namespaces).

    • Default Policies (Admin, Editor, Viewer) provide no data restrictions, suitable for quick onboarding.

    • Custom Policies allow more granular restrictions, specifying exactly which entities a user can see or modify.

    This flexible system gives you robust control over observability data in groundcover, ensuring each user has precisely the access they need.

    Integration Examples with Workflows

    This page provides examples of how to integrate workflows with different notification systems and external services.

    Slack Notification

    This workflow sends a simple Slack message when triggered:

    Metric Summary

    The Metric Summary page shows all metrics in your system with their cardinality, type, unit, and labels. It helps you spot high-cardinality metrics that might slow things down and understand what labels are available for building queries.

    What You'll Find Here

    This page displays every metric groundcover collects, along with how many unique label combinations each one has. You can use it to:

    • Quickly search for metrics by name

    Create Ingestion Key

    Create a new ingestion key.

    Endpoint

    POST /api/rbac/ingestion-keys/create

    Authentication

    List Ingestion Keys

    Get a list of ingestion keys with optional filtering by name, type, and remote configuration status.

    Endpoint

    POST /api/rbac/ingestion-keys/list

    Workflow Examples

    This page provides practical examples of workflows for different use cases and integrations.

    Triggers Examples

    Filter by Monitor Name

    This example shows how to create a workflow that only triggers for a specific monitor (by its name):

    Migrate from Datadog

    Complete guide for migrating your Datadog setup to groundcover.

    Prerequisites

    Access required:

    • Admin role in groundcover

    Get Monitor

    Retrieve detailed configuration for a specific monitor by its UUID, including queries, thresholds, display settings, and evaluation parameters.

    Endpoint

    GET /api/monitors/{uuid}

    resource "groundcover_dashboard" "llm_observability" {
      name             = "LLM Observability"
      description      = "Dashboard to monitor OpenAI and Anthropic usage"
      preset           = "{\"widgets\":[{\"id\":\"B\",\"type\":\"widget\",\"name\":\"Total LLM Calls\",\"queries\":[{\"id\":\"A\",\"expr\":\"span_type:openai span_type:anthropic | stats by(span_type) count() count_all_result | sort by (count_all_result desc) | limit 5\",\"dataType\":\"traces\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"stat\"}},{\"id\":\"D\",\"type\":\"widget\",\"name\":\"LLM Calls Rate\",\"queries\":[{\"id\":\"A\",\"expr\":\"sum(rate(groundcover_resource_total_counter{type=~\\\"openai|anthropic\\\",status_code=\\\"ok\\\"})) by (gen_ai_request_model)\",\"dataType\":\"metrics\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"time-series\",\"selectedChartType\":\"stackedBar\"}},{\"id\":\"E\",\"type\":\"widget\",\"name\":\"Average LLM Response Time\",\"queries\":[{\"id\":\"A\",\"expr\":\"avg(groundcover_resource_latency_seconds{type=~\\\"openai|anthropic\\\"}) by (type)\",\"dataType\":\"metrics\",\"step\":\"disabled\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"stat\",\"step\":\"disabled\",\"selectedUnit\":\"Seconds\"}},{\"id\":\"A\",\"type\":\"widget\",\"name\":\"Total LLM Tokens Used\",\"queries\":[{\"id\":\"A\",\"expr\":\"span_type:openai span_type:anthropic | stats by(span_type) sum(gen_ai.response.usage.total_tokens) sum_result | sort by (sum_result desc) | limit 5\",\"dataType\":\"traces\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"stat\",\"step\":\"disabled\"}},{\"id\":\"C\",\"type\":\"widget\",\"name\":\"AVG Input Tokens Per LLM Call \",\"queries\":[{\"id\":\"A\",\"expr\":\"span_type:openai OR span_type:anthropic | stats by(span_type) avg(gen_ai.response.usage.input_tokens) avg_result | sort by (avg_result desc) | limit 5\",\"dataType\":\"traces\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"stat\"}},{\"id\":\"F\",\"type\":\"widget\",\"name\":\"AVG Output Tokens Per LLM Call \",\"queries\":[{\"id\":\"A\",\"expr\":\"span_type:openai OR span_type:anthropic | stats by(span_type) avg(gen_ai.response.usage.output_tokens) avg_result | sort by (avg_result desc) | limit 5\",\"dataType\":\"traces\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"stat\",\"step\":\"disabled\"}},{\"id\":\"G\",\"type\":\"widget\",\"name\":\"Top Used Models\",\"queries\":[{\"id\":\"A\",\"expr\":\"span_type:openai OR span_type:anthropic | stats by(gen_ai.request.model) count() count_all_result | sort by (count_all_result desc) | limit 100\",\"dataType\":\"traces\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"bar\",\"step\":\"disabled\"}},{\"id\":\"H\",\"type\":\"widget\",\"name\":\"Total LLM Errors \",\"queries\":[{\"id\":\"A\",\"expr\":\"(span_type:openai OR span_type:anthropic) status:error | stats by(span_type) count() count_all_result | sort by (count_all_result desc) | limit 1\",\"dataType\":\"traces\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"stat\"}},{\"id\":\"I\",\"type\":\"widget\",\"name\":\"AVG TTFT Over Time by Model\",\"queries\":[{\"id\":\"A\",\"expr\":\"avg(groundcover_workload_latency_seconds{gen_ai_system=~\\\"openai|anthropic\\\",quantile=\\\"0.50\\\"}) by (gen_ai_request_model)\",\"dataType\":\"metrics\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"time-series\",\"selectedChartType\":\"line\",\"selectedUnit\":\"Seconds\"}},{\"id\":\"J\",\"type\":\"widget\",\"name\":\"Avg Output Tokens Per Second by Model\",\"queries\":[{\"id\":\"A\",\"expr\":\"avg(groundcover_gen_ai_response_usage_output_tokens{}) by (gen_ai_request_model)\",\"dataType\":\"metrics\",\"editorMode\":\"builder\"},{\"id\":\"B\",\"expr\":\"avg(groundcover_workload_latency_seconds{quantile=\\\"0.50\\\"}) by (gen_ai_request_model)\",\"dataType\":\"metrics\",\"editorMode\":\"builder\"},{\"id\":\"formula-A\",\"expr\":\"A / B\",\"dataType\":\"metrics-formula\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"time-series\",\"selectedUnit\":\"Number\"}}],\"layout\":[{\"id\":\"B\",\"x\":0,\"y\":0,\"w\":4,\"h\":6,\"minH\":4},{\"id\":\"D\",\"x\":0,\"y\":30,\"w\":24,\"h\":6,\"minH\":4},{\"id\":\"E\",\"x\":8,\"y\":0,\"w\":8,\"h\":6,\"minH\":4},{\"id\":\"A\",\"x\":16,\"y\":0,\"w\":8,\"h\":6,\"minH\":4},{\"id\":\"C\",\"x\":0,\"y\":12,\"w\":8,\"h\":6,\"minH\":4},{\"id\":\"F\",\"x\":8,\"y\":24,\"w\":8,\"h\":6,\"minH\":4},{\"id\":\"G\",\"x\":16,\"y\":24,\"w\":8,\"h\":6,\"minH\":4},{\"id\":\"H\",\"x\":4,\"y\":0,\"w\":4,\"h\":6,\"minH\":4},{\"id\":\"I\",\"x\":0,\"y\":18,\"w\":24,\"h\":6,\"minH\":4},{\"id\":\"J\",\"x\":0,\"y\":3,\"w\":24,\"h\":6,\"minH\":4}],\"duration\":\"Last 15 minutes\",\"variables\":{},\"spec\":{\"layoutType\":\"ordered\"},\"schemaVersion\":4}"
    }
    terraform plan
    terraform apply
    # Syntax
    terraform import groundcover_dashboard.<local_name> <dashboard_id>
    
    # Example
    terraform import groundcover_dashboard.service_overview dsh_1234567890
    helm upgrade --install groundcover groundcover/groundcover \
      --set global.groundcover_token=<INGESTION_KEY>,clusterId={cluster-name}
    exporters:
      otlphttp/groundcover:
        endpoint: https://{GROUNDCOVER_MANAGED_OPENTELEMETRY_ENDPOINT}
        headers: 
          apikey: {INGESTION_KEY}
    
    pipelines:
      traces:
        exporters:
        - otlphttp/groundcover
    workflow:
      id: webhook
      description: Sends an API to incident.io alerts endpoint
      triggers:
      - type: alert
      consts:
        description: keep.dictget( {{ alert.annotations }}, "_gc_description", '')
        issue: https://app.groundcover.com/monitors/issues?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.fingerprint }}
        monitor: https://app.groundcover.com/monitors?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.labels._gc_monitor_id }}
        redacted_labels: keep.dict_pop({{alert.labels}}, "_gc_monitor_id", "_gc_monitor_name", "_gc_severity", "backend_id", "grafana_folder", "_gc_issue_header")
        silence: https://app.groundcover.com/monitors/create-silence?keep.replace(keep.join({{ consts.redacted_labels }}, "&", "matcher_"), " ", "+")
        title: keep.dictget( {{ alert.annotations }}, "_gc_issue_header", "{{ alert.alertname }}")
      name: incident-io-alerts-workflow
      actions:
      - name: webhook
        provider:
          config: ' {{ providers.your-incident-io-integration-name }} '
          type: webhook
          with:
            body:
              title: '{{ alert.alertname }}'
              description: '{{ alert.description }}'
              deduplication_key: '{{ alert.fingerprint }}'
              status: '{{ alert.status }}'
              # To use metadata attributes that refer to alert.labels, the attributes 
              # must be used in the group by section of the monitor - the example below
              # assumes that cluster, namespace and workload were used for group by
              metadata:
                cluster: '{{ alert.labels.cluster }}'
                namespace: '{{ alert.labels.namespace }}'
                service: '{{ alert.labels.workload }}'
                severity: '{{ alert.annotations._gc_severity }}'
    workflow:
      id: slack-channel-routing-workflow
      description: workflow for all channels with dynamic routing
      triggers:
      - type: alert
        filters:
        - key: annotations.slack-channel-routing-workflow
          value: enabled
      name: slack-channel-routing-workflow
      consts:
        channels: '{"devops":"C0111111111", "alerts":"C0222222222", "incidents":"C0333333333"}'
        channel_id: keep.dictget( '{{ consts.channels }}', '{{ alert.labels.channel_id }}', 'C09G9AFHLTB')
        env: keep.dictget({{ alert.labels }}, 'env', 'no-env')
        upper_env: "keep.uppercase({{consts.env}})"
        severity: keep.dictget({{ alert.annotations }}, '_gc_severity', 'unknown-severity')
        summary: keep.dictget({{ alert.labels }}, 'summary', 'no-summary')
        slack_message: "<https://app.groundcover.com/monitors/create-silence?keep.replace(keep.join(keep.dict_pop({{ alert.labels }}, \"_gc_monitor_id\", \"_gc_monitor_name\", \"_gc_severity\", \"backend_id\", \"grafana_folder\", \"_gc_issue_header\"), \"&\", \"matcher_\"), \" \", \"+\")|Silence> :no_bell: | \n<https://app.groundcover.com/monitors/issues?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.fingerprint }}|Investigate> :mag: | \n<https://app.groundcover.com/monitors?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.labels._gc_monitor_id }}|See Monitor> :chart_with_upwards_trend:\n\n*Labels:*  \n- keep.join(keep.dict_pop({{alert.labels}}, \"_gc_monitor_id\", \"_gc_monitor_name\", \"_gc_severity\", \"backend_id\", \"grafana_folder\", \"_gc_issue_header\"), \"\\n- \")\n"
        title_link: "https://app.groundcover.com/monitors/issues?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.fingerprint }}"
        red_color: "#FF0000"
        green_color: "#008000"
        footer_url: "groundcover.com"
        footer_icon: "https://app.groundcover.com/favicon.ico"
      actions:
      - if: "{{ alert.status }} == 'firing'"
        name: webhook-alert
        provider:
          type: webhook
          config: "{{ providers.slack-routing-webhook }}"
          with:
            body:
              channel: "{{ consts.channel_id }}"
              attachments:
              - color: "{{ consts.red_color }}"
                footer: "{{ consts.footer_url }}"
                footer_icon: "{{ consts.footer_icon }}"
                text: "{{ consts.slack_message }}"
                title: "\U0001F6A8 Firing: {{ alert.alertname }} [{{ consts.upper_env}}]"
                title_link: "{{ consts.title_link }}"
                type: plain_text
      - if: "{{ alert.status }} != 'firing'"
        name: webhook-alert-resolved
        provider:
          type: webhook
          config: "{{ providers.slack-routing-webhook }}"
          with:
            body:
              channel: "{{ consts.channel_id }}"
              text: "\u2705 [RESOLVED][{{ consts.upper_env}}] {{ consts.severity }} {{ alert.alertname }}"
              attachments:
              - color: "{{ consts.green_color }}"
                text: "*Summary:* {{ consts.summary }}"
                fields:
                - title: "Environment"
                  value: "{{ consts.upper_env}}"
                  short: true
                footer: "{{ consts.footer_url }}"
                footer_icon: "{{ consts.footer_icon }}"
    terraform {
      required_providers {
        groundcover = {
          source  = "registry.terraform.io/groundcover-com/groundcover"
          version = ">= 0.0.0" # Replace with actual version constraint
        }
      }
    }
    provider "groundcover" {
      api_key  = "YOUR_API_KEY" # Required
      base_url = "https://api.groundcover.com" # Optional, change if using onprem/airgap deployment
      backend_id = "groundcover" # Your Backend ID can be found in the groundcover UI under Settings->Access->API Keys
    }
    resource "groundcover_policy" "read_only" {
      name        = "Read-Only Policy"
      description = "Grants read-only access"
      claim_role  = "ci-readonly-role"
      roles = {
        read = "read"
      }
    }
    
    resource "groundcover_serviceaccount" "ci_account" {
      name         = "ci-pipeline-account"
      description  = "Service account for CI"
      policy_uuids = [groundcover_policy.read_only.id]
    }
    
    resource "groundcover_apikey" "ci_key" {
      name               = "CI Key"
      description        = "Key for CI pipeline"
      service_account_id = groundcover_serviceaccount.ci_account.id
    }

    Insight show / hide

    Revocation effect

    Data stops flowing immediately

    API calls fail

    Monitors
    https://github.com/groundcover-com/docs/blob/main/use-groundcover/broken-reference/README.md
    Ingestion Keys
    Dashboards
    Choose a Display Type
  • Save the widget

  • Pie

    Select a data source and aggregation method.

    Logs

    Traces

    Time Series

    Choose a Y-axis unit from the predefined list.

    Select a visualization type: Stacked Bar or Line Chart.

    Metrics

    Infra Metrics

    Logs

    Traces

    Table

    Define columns based on data fields or metrics.

    Choose a Y-axis unit from the predefined list.

    Metrics

    Infra Metrics

    Logs

    Traces

    Stat

    Select a Y-axis unit from the predefined list.

    Metrics

    Infra Metrics

    Logs

    Traces

    Top List

    Choose a ranking metric and sort order.

    Query Builder section

    Logs

    Traces

    Authorization

    Yes

    Bearer token with your API key

    Content-Type

    Yes

    Must be application/json

    Accept

    Yes

    Must be application/json

    sources

    array

    No

    Source filters (empty array returns all monitors)

    monitors

    array

    Array of monitor objects

    uuid

    string

    Unique identifier for the monitor

    title

    string

    Monitor name/description

    type

    string

    Monitor type (see monitor types below)

    "metrics"

    Metrics-based monitoring

    "traces"

    Distributed tracing monitoring

    "logs"

    Log-based monitoring

    "events"

    Event-based monitoring

    "infra"

    Infrastructure monitoring

    ""

    (empty string) General/unspecified monitoring

    Automated detection of new versions and upgrades suggestions
    Helm
    ArgoCD
    See details here
    here
    ingestion key documentation
    ArgoCD deployment guide
    5 quick steps to get you started
    actions: Array of actions to perform when triggered
  • name: Display name for the workflow

  • consts (optional): Constants and helper variables

  • Authorization

    Bearer <YOUR_API_KEY>

    Your groundcover API key

    Content-Type

    text/plain

    The request body should be raw YAML

    groundcover workflow examples documentation

    <V>

    Version

    v0.32.0

    <TM>

    Time measure

    5.5ms

    Count and percentage of total logs

  • Pattern’s trend over time

  • Workload origin

  • The structured pattern itself

  • <TS>

    Timestamp

    2025-03-31T17:00:00Z

    <N>

    Number

    404, 123

    <IP4>

    IPv4 Address

    192.168.0.1

    <*>

    Wildcard (text, path, etc.)

    /api/v1/users/42

    Slack with Rich Formatting

    This workflow sends a formatted Slack message using Block Kit:

    PagerDuty Integration

    This workflow creates a PagerDuty incident:

    Opsgenie Integration

    This workflow creates an Opsgenie alert:

    • Alias is used to group identical events together in Opsgenie (alias key in the payload)

    • Severities must be mapped to Opsgenie valid severities (priority key in the payload)

    • Tags are a list of string values (tags key in the payload)

    Jira Ticket Creation

    This workflow creates a Jira ticket using webhook integration:

    Multiple Actions

    This workflow performs multiple actions for the same alert:

    Filter by Environment

    Execute only on the Prod environment. The "env" attribute needs to be part of the monitor context attributes (either by using it in the group by section or by explicitly adding it as a context label):

    Filter by Multiple Conditions

    This example shows how to combine multiple filters. In this case it will match events from the prod environment and also monitors that explicitly routed the workflow with the name "actual-name-of-workflow":

    Filter by Regex

    In this case we will use a regular expression to filter on events coming from the groundcover OR monitoring namespaces. Note that any regular expression can be used:

    Consts Examples

    The consts section is the best location to create pre-defined attributes and apply different transformations on the monitor's metadata for formatting the notification messaging.

    Map Severities

    Severities in your notified destination may not match the groundcover predefined severities. By using a dictionary, you can map any groundcover severity value to another, and extract it by using the actual monitor severity. Use the "keep.dictget" function to extract from a dictionary and apply a default in case the value is missing.

    Best Practice for Accessing Monitor Labels

    When accessing a context label via alert.labels, if this label is not transferred within the monitor - the workflow might crash. Best practice to pre-define labels is to declare them in the consts section with a default value, using "keep.dictget" so the value is gracefully pulled from the labels object.

    Note: Label names that are dotted, like "cloud.region" in this example, cannot be referenced in the monitor itself and can only be retrieved using this technique of pulling the value with "keep.dictget" from the alert.labels object.

    Additional Useful Functions

    • keep.dict_pop({{alert.labels}}, "_gc_monitor_id", "_gc_monitor_name", "_gc_severity", "backend_id", "grafana_folder", "_gc_issue_header") - "Clean" a key-value dictionary from some irrelevant values (keys). In this case, the groundcover labels dictionary has some internal keys that you might not want to include in your notification content.

    • keep.join(["a", "b", "c"], ",") - Joins a list of elements into a string using a given delimiter. In this case the output is "a,b,c".

    Action Examples

    Conditional Statements

    Use "if" condition to apply logic on different actions.

    Create a separate block for a firing monitor (a resolved monitor can use different logic to change formatting of the notification):

    "If" statements can include and/or logic for multiple conditions:

    Notification by Specific Hours

    Use the function keep.is_business_hours combined with an "if" statement to trigger an action within specific hours only.

    In this example the action block will execute on Sundays (6) between 20-23 (8pm to 11pm) or on Mondays (0) between 0-1am:

    {
      "sources": []
    }
    {
      "monitors": [
        {
          "uuid": "string",
          "title": "string",
          "type": "string"
        }
      ]
    }
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/monitors/list' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --data-raw '{"sources":[]}'
    {
      "monitors": [
        {
          "uuid": "xxxx-xxxx-xxxx-xxxx-xxxx",
          "title": "PVC usage above threshold (90%)",
          "type": "metrics"
        },
        {
          "uuid": "xxxx-xxxx-xxxx-xxxx-xxxx",
          "title": "HTTP API Errors Monitor",
          "type": "traces"
        },
        {
          "uuid": "xxxxx-xxxx-xxxx-xxxx-xxxx",
          "title": "Error Logs Monitor",
          "type": "logs"
        },
        {
          "uuid": "xxxxx-xxxx-xxxx-xxxx-xxxx",
          "title": "Node CPU Usage Average is Above 85%",
          "type": "infra"
        },
        {
          "uuid": "xxxx-xxxx-xxxx-xxxx-xxxx",
          "title": "Rolling Update Triggered",
          "type": "events"
        },
        {
          "uuid": "xxxx-xxxx-xxxx-xxxx-xxxx",
          "title": "Deployment Partially Not Ready - 5m",
          "type": "events"
        }
      ]
    }
    global:
      backend:
        enabled: false
      ingress:
        site: {inCloud_Site}
    
    clusterId: "your-cluster-name" # CLI will automatically detect cluster name
    sh -c "$(curl -fsSL https://groundcover.com/install.sh)"
    groundcover deploy -f values.yaml
    sh -c "$(curl -fsSL https://groundcover.com/install.sh)"
    groundcover auth get-ingestion-key sensor
    global:
      groundcover_token: {sensor_key}
      backend:
        enabled: false
      ingress:
        site: {inCloud_site}
        
    clusterId: "your-cluster-name"
    # Add groundcover Helm repository and fetch latest chart
    helm repo add groundcover https://helm.groundcover.com && helm repo update groundcover
    helm upgrade \
        groundcover \
        groundcover/groundcover \
        -i \
        --create-namespace \
        -n groundcover \
        -f values.yaml
    helm repo update groundcover && helm upgrade \
        groundcover \
        groundcover/groundcover \
        -n groundcover \
        -f values.yaml
    groundcover delete
    helm uninstall groundcover -n groundcover
    # delete the namespace in order to remove the PVCs as well
    kubectl delete ns groundcover
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/workflows/create' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: text/plain' \
      --data 'id: example-workflow
    description: Example workflow for API documentation
    triggers:
    - type: alert
      filters:
      - key: annotations.example-workflow
        value: enabled
    name: example-workflow
    consts:
      severity: keep.dictget({{ alert.annotations }}, "_gc_severity", "info")
      title: keep.dictget({{ alert.annotations }}, "_gc_issue_header", "{{ alert.alertname }}")
    actions:
    - name: webhook-action
      provider:
        type: webhook
        config: "{{ providers.webhook-provider }}"
        with:
          body:
            alert_name: "{{ consts.title }}"
            severity: "{{ consts.severity }}"'
    {
      "workflow_id": "xxxxx-xxxx-xxxxx-xxxx-xxxxx",
      "status": "created",
      "revision": 1
    }
    id: workflow-id
    description: Workflow description
    triggers:
      - type: alert
        filters:
          - key: filter-key
            value: filter-value
    name: workflow-name
    consts:
      variable_name: value
    actions:
      - name: action-name
        provider:
          type: provider-type
          config: provider-config
          with:
            action-specific-parameters
    actions:
    - if: '{{ alert.status }} == "firing"'
      name: slack-action-firing
      provider:
        config: '{{ providers.integration-name }}'
        type: slack
        with:
          attachments:
          - color: '#FF0000'
            footer: 'groundcover.com'
            footer_icon: 'https://app.groundcover.com/favicon.ico'
            text: 'Alert details here'
            title: 'Firing: {{ alert.alertname }}'
            ts: keep.utcnowtimestamp()
            type: plain_text
          message: ' '
    192.168.0.1 - - [30/Mar/2025:12:00:01 +0000] "GET /api/v1/users/123 HTTP/1.1" 200
    <IP4> - - [<TS>] "<*> HTTP/<N>.<N>" <N>
    workflow: 
      id: slack-notification
      description: Send Slack notification for alerts
      triggers:
        - type: alert
      actions:
        - name: slack-notification
          provider:
            type: slack
            config: '{{ providers.slack_webhook }}'
            with:
              message: "Alert: {{ alert.alertname }} - Status: {{ alert.status }}"
    workflow: 
      id: slack-rich-notification
      description: Send formatted Slack notification
      triggers:
        - type: alert
      actions:
        - name: slack-rich-notification
          provider:
            type: slack
            config: '{{ providers.slack_webhook }}'
            with:
              blocks:
              - type: header
                text:
                  type: plain_text
                  text: ':rotating_light: {{ alert.alertname }} :rotating_light:'
                  emoji: true
              - type: divider
              - type: section
                fields:
                - type: mrkdwn
                  text: |-
                    *Cluster:*
                    {{ alert.labels.cluster}}
                - type: mrkdwn
                  text: |-
                    *Namespace:*
                    {{ alert.labels.namespace}}
                - type: mrkdwn
                  text: |-
                    *Status:*
                    {{ alert.status}}
     workflow:
      id: pagerduty-incident-workflow
      description: Create PagerDuty incident for alerts
      name: pagerduty-incident-workflow
      triggers:
      - type: alert
        filters:
        - key: annotations.pagerduty-incident-workflow
          value: enabled
      consts:
        severities: '{"S1": "critical","S2": "error","S3": "warning","S4": "info","critical": "critical","error": "error","warning": "warning","info": "info"}'
        severity: keep.dictget( '{{ consts.severities }}', '{{ alert.annotations._gc_severity }}', 'info')
        description: keep.dictget( {{ alert.annotations }}, "_gc_description", "")
        title: keep.dictget( {{ alert.annotations }}, "_gc_issue_header", '{{ alert.alertname }}')
        redacted_labels: keep.dict_pop({{ alert.labels }}, "_gc_monitor_id", "_gc_monitor_name", "_gc_severity", "backend_id", "grafana_folder")
        env: keep.dictget( {{ alert.labels }}, "env", "- no env -")
        namespace: keep.dictget( {{ alert.labels }}, "namespace", "- no namespace -")
        workload: keep.dictget( {{ alert.labels }}, "workload", "- no workload -")
        pod: keep.dictget( {{ alert.labels }}, "podName", "- no pod -")
        issue: https://app.groundcover.com/monitors/issues?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.fingerprint }}
        monitor: https://app.groundcover.com/monitors?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.labels._gc_monitor_id }}
        silence: https://app.groundcover.com/monitors/create-silence?keep.replace(keep.join({{ consts.redacted_labels }}, "&", "matcher_"), " ", "+")
      actions:
      - name: pagerduty-alert
        provider:
          config: '{{ providers.pagerduty-integration-name }}'
          type: pagerduty
          with:
            title: '{{ consts.title }}'
            severity: '{{ consts.severity }}'
            dedup_key: '{{alert.fingerprint}}'
            custom_details:
              01_environment: '{{ consts.env }}'
              02_namespace: '{{ consts.namespace }}'
              03_service_name: '{{ consts.workload }}'
              04_pod: '{{ consts.pod }}'
              05_labels: '{{ consts.redacted_labels }}'
              06_monitor: '{{ consts.monitor }}'
              07_issue: '{{ consts.issue }}'
              08_silence: '{{ consts.silence }}'
    
    workflow:
      id: Opsgenie Example
      description: "Opsgenie workflow"
      triggers:
      - type: alert
        filters:
        - key: annotations.Opsgenie Example
          value: enabled
      consts:
        description: keep.dictget( {{ alert.annotations }}, "_gc_description", "")
        redacted_labels: keep.dict_pop({{ alert.labels }}, "_gc_monitor_id", "_gc_monitor_name", "_gc_severity", "backend_id", "grafana_folder", "CampaignName")
        severities: '{"S1": "P1","S2": "P2","S3": "P3","S4": "P4","critical": "P1","error": "P2","warning": "P3","info": "P4"}'
        severity: keep.dictget({{ consts.severities }}, {{ alert.annotations._gc_severity }}, "P3")
        title: keep.dictget( {{ alert.annotations }}, "_gc_issue_header", '{{ alert.alertname }}')
        region: keep.dictget( {{ alert.labels }}, "cloud.region", "")
        TenantID: keep.dictget( {{ alert.labels }}, "tenantID", "")
      name: Opsgenie Example
      actions:
      - if: '{{ alert.status }} == "firing"'
        name: opesgenie-alert
        provider:
          config: "{{ providers.Opsgenie }}"
          type: opsgenie
          with:
            alias: '{{ alert.fingerprint }}'
            description: '{{ consts.description }}'
            details: '{{ consts.redacted_labels }}'
            message: '{{ consts.title }}'
            priority: '{{ consts.severity }}'
            source: groundcover
            tags:
            - '{{ alert.alertname }}'
            - '{{ consts.TenantID }}'
            - '{{ consts.region }}'
      
    workflow:
      id: jira-ticket-creation
      description: Create Jira ticket for alerts
      triggers:
        - type: alert
      consts:
        description: keep.dictget({{ alert.annotations }}, "_gc_description", '')
        title: keep.dictget({{ alert.annotations }}, "_gc_issue_header", "{{ alert.alertname }}")
      actions:
        - name: jira-ticket
          provider:
            type: webhook
            config: '{{ providers.jira_webhook }}'
            with:
              body:
                fields:
                  description: '{{ consts.description }}'
                  issuetype:
                    id: 10001
                  project:
                    id: 10000
                  summary: '{{ consts.title }}'
    workflow:
      id: multi-action-workflow
      description: Perform multiple actions for critical alerts
      triggers:
        - type: alert
          filters:
            - key: severity
              value: critical
      actions:
        - name: slack-notification
          provider:
            type: slack
            config: '{{ providers.slack_webhook }}'
            with:
              message: "Critical alert: {{ alert.alertname }}"
        - name: pagerduty-incident
          provider:
            type: pagerduty
            config: '{{ providers.pagerduty_prod }}'
            with:
              title: "Critical: {{ alert.alertname }}"
        - name: jira-ticket
          provider:
            type: webhook
            config: '{{ providers.jira_webhook }}'
            with:
              body:
                fields:
                  summary: "Critical Alert: {{ alert.alertname }}"
                  description: "Critical alert triggered in {{ alert.labels.namespace }}"
                  issuetype:
                    id: 10001
                  project:
                    id: 10000
    workflow: 
      id: specific-monitor-workflow
      description: Workflow triggered only by Workload Pods Crashed Monitor
      triggers:
        - type: alert
          filters:
            - key: alertname
              value: Workload Pods Crashed Monitor
    workflow: 
      id: prod-only-workflow
      description: Workflow triggered only by production environment alerts
      triggers:
        - type: alert
          filters:
            - key: env
              value: prod
    workflow: 
      id: multi-filter-workflow
      description: Workflow triggered by critical alerts in production
      triggers:
        - type: alert
          filters:
            - key: env
              value: prod
            - key: annotations.actual-name-of-workflow
              value: enabled
    workflow: 
      id: regex-filter-workflow
      description: Workflow triggered by alerts from groundcover or monitoring namespaces
      triggers:
        - type: alert
          filters:
            - key: namespace
              value: r"(groundcover|monitoring)"
    workflow:
      id: severity-mapping-example
      description: Example of mapping severities using consts
      triggers:
        - type: alert
      consts:
        severities: '{"S1": "P1","S2": "P2","S3": "P3","S4": "P4","critical": "P1","error": "P2","warning": "P3","info": "P4"}'
        severity: keep.dictget({{ consts.severities }}, {{ alert.annotations._gc_severity }}, "P3")
    workflow:
      id: labels-best-practice-example
      description: Example of safely accessing monitor labels
      triggers:
        - type: alert
      consts:
        region: keep.dictget({{ alert.labels }}, "cloud.region", "")
    workflow:
      id: conditional-actions-example
      description: Example of conditional actions based on alert status
      triggers:
        - type: alert
      actions:
        - if: '{{ alert.status }} == "firing"'
          name: slack-action-firing
          provider:
            config: '{{ providers.groundcover-alerts-dev }}'
            type: slack
            with:
              attachments:
              - color: '{{ consts.red_color }}'
                footer: '{{ consts.footer_url }}'
                footer_icon: '{{ consts.footer_icon }}'
                text: '{{ consts.slack_message }}'
                title: 'Firing: {{ alert.alertname }}'
                title_link: '{{ consts.title_link }}'
                ts: keep.utcnowtimestamp()
                type: plain_text
              message: ' '
    workflow:
      id: multi-condition-actions-example
      description: Example of multiple conditions in actions
      triggers:
        - type: alert
      actions:
        - if: '{{ alert.status }} == "firing" and {{ alert.labels.namespace }} == "namespace1"'
          name: slack-action-firing
          provider:
            config: '{{ providers.groundcover-alerts-dev }}'
            type: slack
            with:
              attachments:
              - color: '{{ consts.red_color }}'
                footer: '{{ consts.footer_url }}'
                footer_icon: '{{ consts.footer_icon }}'
                text: '{{ consts.slack_message }}'
                title: 'Firing: {{ alert.alertname }}'
                title_link: '{{ consts.title_link }}'
                ts: keep.utcnowtimestamp()
                type: plain_text
              message: ' '
    workflow:
      id: time-based-notification-example
      description: Example of time-based conditional actions
      triggers:
        - type: alert
      actions:
        - if: '({{ alert.status }} == "firing" and (keep.is_business_hours(timezone="America/New_York", business_days=[6], start_hour=20, end_hour=23) or keep.is_business_hours(timezone="America/New_York", business_days=[0], start_hour=0, end_hour=1)))'
          name: time-based-notification
          provider:
            type: slack
            config: '{{ providers.slack_webhook }}'
            with:
              message: "Time-sensitive alert: {{ alert.alertname }}"

    Redis

    supported

    DNS

    supported

    Kafka

    supported

    MongoDB

    supported

    v3.6+

    AMQP

    supported

    AMQP 0-9-1

    GraphQL

    supported

    AWS S3

    supported

    AWS SQS

    supported

    HTTP

    supported

    gRPC

    supported

    MySQL

    supported

    PostgreSQL

    supported

    crypto/tls (golang)

    supported

    OpenSSL (c, c++, Python)

    supported

    NodeJS

    supported

    JavaSSL

    supported

    Java 11+ is supported. Requires

    Default Editor Policy

    • Permission: Editor

    • Data Scope: Full (no restrictions)

    • Behavior: Full creation/editing capabilities on observability data, but no user or system management.

  • Default Viewer Policy

    • Permission: Viewer

    • Data Scope: Full (no restrictions)

    • Behavior: Read-only access to all data in groundcover.

  • AND
    Environment =
    QA
    ,” restricting
    all
    logs, traces, events, etc. to the Dev cluster and QA environment.
  • Advanced Mode

    • Lets you define different scopes for each data entity (logs, traces, events, workloads, etc.).

    • Each scope can use OR logic among conditions, allowing more fine-grained control.

    • Example:

      • Logs: “Cluster = Dev OR Prod,”

      • Traces: “Namespace = abc123,”

      • Events: “Environment = Staging OR Prod

  • Data scopes merge via OR logic, broadening the user's overall data access.

  • Example: Policy A => "Cluster = A," Policy B => "Environment = B," so final scope is "Cluster A OR Environment B."

  • This applies to all data types including logs, traces, events, workloads, and metrics.

  • Multiple Policies can co-exist, merging permission levels and data scopes via OR logic across all data types.

    Identify metrics with high cardinality that could impact performance

  • See what labels are available for each metric

  • Find the right metrics and labels when building dashboards and monitors

  • Page Layout

    Search Bar

    Use the search field at the top to filter metrics by name. The search is case-insensitive and matches partial names.

    Example searches:

    • groundcover_workload finds all workload-related metrics

    • latency finds all latency metrics

    • cpu finds all CPU metrics

    Cardinality Chart

    This chart shows total cardinality across all your metrics over the selected time range. You can track trends over time and spot sudden spikes that might indicate a problem.

    The cardinality chart shows last 24 hours in 3 hour intervals

    Metrics Table

    The table shows details for each metric:

    Column
    Description

    Metric Name

    The full metric name, like groundcover_workload_latency_seconds

    Type

    Counter, Gauge, Summary, or Histogram

    Unit

    What the metric measures in (Seconds, Bytes, Number, etc.)

    Cardinality

    Number of unique label combinations in the past 24 hours. Higher numbers mean more unique time series

    Percentage

    What percentage of total cardinality this metric represents

    Description

    What the metric measures

    Metric Details Drawer

    Click any row to open a detailed drawer on the right. The drawer shows:

    • The metric name at the top

    • Navigation arrows to move between metrics

    • A cardinality chart specific to this metric

    • Two tabs with more information

    Details Tab

    Basic information about the metric:

    Field
    Description

    Description

    What the metric measures

    Unit

    Unit of measurement

    Cardinality

    Current unique label combination count

    Type

    Counter, Gauge, Summary, or Histogram

    Labels Tab

    All labels available for this metric:

    Column
    Description

    Label

    The label name, like namespace or pod_name

    Cardinality

    How many unique values this label has

    Values Preview

    Up to 3 example values

    Use this information when building queries or identifying which labels contribute most to cardinality.

    How to Use This Page

    Finding High-Cardinality Metrics

    1. Go to Explore > Metric Summary

    2. Click the Cardinality column header to sort

    3. Metrics at the top have the most unique label combinations

    4. Click on one to see which labels drive the cardinality in the Labels tab

    Discovering Available Metrics

    1. Type a keyword in the search bar (http, database, cpu, etc.)

    2. Browse the filtered results

    3. Click a metric to see its description and labels

    4. Note the metric name and labels for your dashboards or monitors

    Understanding Metric Structure

    1. Click any metric in the table

    2. Check the Details tab for basic information

    3. Switch to the Labels tab to see all available dimensions

    4. Use the Values Preview to see what values each label can have

    This endpoint requires API Key authentication via the Authorization header.

    Headers

    Request Body

    Required and optional fields for creating an ingestion key:

    Parameter
    Type
    Required
    Description

    name

    string

    Yes

    Unique name for the ingestion key (must be lowercase with hyphens)

    type

    string

    Yes

    Key type ("sensor", "thirdParty", "rum")

    tags

    array

    No

    Examples

    Create Basic Sensor Key

    Create Third-Party Integration Key

    Create RUM Key with Configuration

    Response

    Response Schema

    Response Example

    Key Types

    Type
    Description
    Default remoteConfig

    "sensor"

    Keys for groundcover sensors and agents

    true

    "thirdParty"

    Keys for third-party integrations (OpenTelemetry, etc.)

    false

    "rum"

    Keys for Real User Monitoring data ingestion

    false

    Verification

    To verify the key was created successfully, use the List Ingestion Keys endpoint:

    Naming Requirements

    • Names must be lowercase with hyphens as separators

    • No capital letters, spaces, or special characters (except hyphens)

    • Examples of valid names: production-k8s-sensor, otel-staging-api, rum-frontend

    • Examples of invalid names: Production-K8s, OTEL_API, rum frontend

    Related Documentation

    For comprehensive information about ingestion keys, including usage and management, see:

    • Ingestion Keys

    Authentication

    This endpoint requires API Key authentication via the Authorization header.

    Headers

    Request Body

    Optional filters for ingestion keys:

    Parameter
    Type
    Required
    Description

    name

    string

    No

    Filter by exact key name

    type

    string

    No

    Filter by key type ("sensor", "thirdParty", "rum")

    remoteConfig

    boolean

    No

    Examples

    Get All Ingestion Keys

    Filter by Type

    Get only sensor keys:

    Filter by Name and Remote Config

    Response Example

    Response Schema

    Field
    Type
    Description

    id

    string

    Unique identifier for the ingestion key (UUID)

    name

    string

    Human-readable name for the key

    createdBy

    string

    Email of the user who created the key

    creationDate

    string

    ISO 8601 timestamp of key creation

    Related Documentation

    For comprehensive information about ingestion keys, including creation, usage, and best practices, see:

    • Ingestion Keys

    Datadog API key and application key with read permissions

    Datadog application key scopes:

    • dashboards_read - List and retrieve dashboards

    • monitors_read - View monitors

    • metrics_read - Query timeseries data

    • integrations_read - View AWS, GCP, Azure integrations

    Create a migration project

    Navigate to Settings → Migrations.

    1. Click Start on the Datadog card

    2. Enter a project name (e.g., "Production Migration", "US5 Migration")

    3. Click Create

    Tip: Use descriptive names. You can run multiple migration projects for different environments or teams.

    Fetch assets from Datadog

    Provide your Datadog credentials:

    Datadog site

    The domain of your Datadog console. The options are:

    • US1 - app.datadoghq.com

    • US3 - us3.datadoghq.com

    • US5 - us5.datadoghq.com

    • EU1 - app.datadoghq.eu

    • AP1 - ap1.datadoghq.com

    You can find your Datadog site by looking at your console's URL.

    API key

    A regular Datadog API key. Find this under Organization Settings → API Keys.

    Application key

    Create one under Organization Settings → Application Keys with the required scopes listed above.

    Important: groundcover does not store these keys. Assets are fetched, then the keys are discarded.

    Click Fetch Assets. This typically takes 10 seconds depending on the number of assets.

    Review migration summary

    Once fetched, you see:

    • Progress overview: Total assets discovered and their support status

    • Asset cards: Monitors, Dashboards, Data Sources, etc

    • Support breakdown: How many assets are fully supported, partial, or unsupported

    The overview shows everything we found in your Datadog account and what we'll bring over.

    Migrate data sources

    Before migrating monitors and dashboards, we set up your data sources.

    What we detect

    groundcover automatically discovers:

    • AWS integrations: CloudWatch metrics, account configurations

    • GCP integrations: Cloud Monitoring metrics, project setups

    • Azure integrations: Azure Monitor metrics, subscription details

    Tip: Migrate all data sources first. This prevents missing data issues when monitors go live.

    Migrate monitors

    Once data sources are ready, migrate your monitors.

    Monitor status indicators

    • ✓ Supported: Fully compatible. Migrate as-is.

    • ⚠ Partial: Migrates with warnings. Review before installing.

    • ✗ Unsupported: Requires manual attention.

    Review warnings

    For monitors with warnings, click View Warnings:

    • See what adjustments were made

    • Understand query translations

    • Get recommendations for post-migration verification

    Warnings don't block migration — they inform you of changes so you can verify behavior.

    Migrate monitors

    Single monitor:

    1. Preview the monitor

    2. Click Migrate

    3. Monitor installs immediately

    Bulk migrate:

    1. Select multiple monitors using checkboxes

    2. Click Migrate Selected

    3. All install in parallel

    Migrated monitors appear instantly in Monitors → Monitor List.

    Migrate dashboards

    Dashboards preserve:

    • Layout and widget positions

    • Query logic and filters

    • Time ranges and visualization settings

    • Colors and formatting

    Check out the dashboard preview to confirm the migration worked and that all your assets came through successfully.

    Migrate dashboards

    Click Migrate to install. Dashboards appear under Dashboards immediately.

    Tip: Migrate critical dashboards first. Verify queries return expected data before bulk migrating.

    Authentication

    This endpoint requires API Key authentication via the Authorization header.

    Headers

    Header
    Required
    Description

    Authorization

    Yes

    Bearer token with your API key

    Content-Type

    Yes

    Must be application/json

    Accept

    Yes

    Must be application/json

    Path Parameters

    Parameter
    Type
    Required
    Description

    uuid

    string

    Yes

    The unique identifier of the monitor to retrieve

    Field Descriptions

    Field
    Type
    Description

    title

    string

    Monitor name/title

    display.header

    string

    Alert header template with variable substitution

    display.resourceHeaderLabels

    array

    Labels shown in resource headers

    display.contextHeaderLabels

    array

    Labels shown in context headers

    Examples

    Basic Request

    Get monitor configuration by UUID:

    Response Example - Metrics Monitor

    Traces

    Our traces philosophy

    Traces are a powerful observability pillar, providing granular insights into microservice interactions. Traditionally, they were hard to implement, requiring coordination of multiple teams and constant code changes, making this critical aspect very challenging to maintain.

    groundcover's eBPF sensor disrupts the famous tradeoff, empowering developers to gain full visibility into their applications, effortlessly and without any code changes.

    The platform supports two kinds of traces:

    eBPF traces

    These traces are automatically generated for every service in your stack. They are available out-of-the-box and within seconds of installation. These traces always include critical information such as:

    • All services that took part in the interaction (both client and server)

    • Accessed resource

    • Full payloads, including:

      • All headers

    3rd-party traces

    These can be ingested into the platform, allowing to leverage already existing instrumentation to create a single pane of glass for all of your traces.

    Traces are stored in groundcover's ClickHouse deployment, ensuring top notch performance on every scale.

    For more details about ingesting 3rd party traces, see the .

    Sampling

    groundcover further disrupts the customary traces experience by reinventing the concept of sampling. This innovation differs between the different types of traces:

    eBPF traces

    These are generated by using 100% of the data, always processing every request being made, on every scale. However, the groundcover platform utilizes smart sampling to only store a fraction of the traces, while still generating an accurate picture. In general, sampling is performed according to these rules:

    • Requests with unusually high or low latencies, measured per resource

    • Requests which returned an error response (e.g 500 status code for HTTP)

    • "Normal" requests which form the baseline for each resource

    Lastly, is utilized to make the sampling decisions on the node itself, without having to send or save any redundant traces.

    Certain aspects of our sampling algorithm are configurable - read more .

    3rd-party traces

    Various mechanisms control the sampling performed over 3rd party traces. Read more here:

    When integrating 3rd-party traces, it is often wise to configure some sampling mechanism according to the specific use case.

    Additional Context

    Each trace is enriched with additional information to give as much context as possible for the service which generated the trace. This includes:

    • Container information - image, environment variables, pod name

    • Logs generated by the service around the time of the trace

    • of the resource around the time of the trace

    • Kubernetes events relevant to the service

    Distributed Tracing

    One of the advantages of ingesting 3rd-party traces is the ability to leverage their distributed tracing feature. groundcover natively displays the full trace for ingested traces in the Traces page.

    Trace Attributes

    Trace Attributes enable advanced filtering and search capabilities. groundcover support attributes across all trace types. This encompasses a diverse range of protocols such as HTTP, MongoDB, PostgreSQL, and others, as well as varied sources including eBPF or manual instrumentations (for example - OpenTelemetry).

    groundcover enriches your original traces and generates meaningful metadata as key-value pairs. This metadata includes critical information, such as protocol type, http.path, db.statement, and similar attributes, aligning with OTel conventions. Furthermore, groundcover seamlessly incorporates this metadata from spans received through supported manual instrumentations. For an in-depth understanding of attributes in OTel, please refer to (external link to OpenTelemtry website).

    Each attribute can be effortlessly integrated into your filters and search queries. You can add them directly from the trace side-panel with a simple click or input them manually into the search bar.

    Example: If you want to filter all HTTP traces that contain the path "/products". The query would be formatted as: @http.path:"/products". For a comprehensive guide on the query syntax, see Syntax table below.

    Trace Tags

    Trace Tags enable advanced filtering and search capabilities. groundcover support tags across all trace types. This encompasses a diverse range of protocols such as HTTP, MongoDB, PostgreSQL, and others, as well as varied sources including eBPF or manual instrumentations (for example - OpenTelemetry).

    Tags are powerful metadata components, structured as key-value pairs. They offer insightful information about the resource generating the span, like: container.image.name ,host.name and more.

    Tags include metadata enriched by the our sensor and additional metadata if provided by manual instrumentations (such as OpenTelemetry traces) . Utilizing these Tags enhances understanding and context of your traces, allowing for more comprehensive analysis and easier filtering by the relevant information.

    Each tag can be effortlessly integrated into your filters and search queries. You can add them directly from the trace side-panel with a simple click or input them manually into the search bar.

    Example: If you want to filter all traces from mysql containers - The query would be formatted as: container.image.name:mysql. For a comprehensive guide on the query syntax, see Syntax table below.

    Search and filter

    The Trace Explorer integrates dynamic filters and a versatile search functionality, to enhance your trace data analysis. You can filter out traces using specific criteria, including trace-status, workload, namespace and more, as well as limit your search to a specific time range.

    Traces Pipelines

    groundcover natively supports setting up log pipelines using This allow for full flexibility in the processing and manipulation of traces being collected - parsing additional patterns by regex, renaming attributes, and many more.

    Controlling retention

    groundcover allows full control over the retention of your traces. to learn more.

    Custom Configuration

    Tracing can be customized in several ways:

    SQL Based Monitors

    Sometimes there are use cases that involve complex queries and conditions for triggering a monitor. This might go beyond the built-in query logic that is provided within the groundcover logs page query language.

    An example for such a use case could be the need to compare some logs to the same ones in a past period. This is not something that is regularly available for log search but can definitely be something to alert on. If the number of errors for a group of logs dramatically changes from a previous week, this could be an event to alert and investigate.

    For such use cases you can harness the powerful ClickHouse SQL language to create an SQL based monitor within groundcover.

    ClickHouse within groundcover

    Log and Trace telemetry data is stored within a ClickHouse database.

    You can directly query this data using SQL statements and create powerful monitors.

    To create and test your SQL queries use the page within the groundcover app.

    Select the ClickHouse@groundcover datasource with the SQL Editor option to start crafting your SQL queries

    Start with show tables; to see of all the available tables to use for your queries: logs and traces would be popular choices (table names are case sensitive).

    Query best practices

    While testing your queries always use LIMIT to limit your results to a small set of data.

    To apply the Grafana timeframe on your queries make sure to add the following conditions:

    Logs: WHERE $__timeFilter(timestamp)

    Traces: WHERE $__timeFilter(start_timestamp)

    Note: When querying logs with SQL, it's crucial to use efficient filters to prevent timeouts and enhance performance. Implementing primary filters like cluster, workload, namespace, and env will significantly speed up queries. Always integrate these filters when writing your queries to avoid inefficient queries.

    Filtering on attributes and tags

    Traces and Logs have rich context that is normally stored in dedicated columns in json format. Accessing the context for filtering and retrieving values is a popular need when querying the data.

    To get to the relevant context item, either in the attributes or tags you can use the following syntax:

    WHERE string_attributes['host_name'] = 'my.host'

    WHERE string_tags['cloud.name'] = 'aws'

    WHERE float_attributes['hradline_count'] = 4

    WHERE float_tags['container.size'] = 22.4

    To use the float context ensure that the relevant attributes or tags are indeed numeric. To do that, check the relevant log in json format to see if the referenced field is not wrapped with quotes (for example, headline_count in the screenshot below)

    SQL Query structure for a monitor

    In order to be able to use an SQL query to create a monitor you must make sure the query returns no more than a single numeric field - this is the monitored field on which the threshold is placed.

    The query can also contain any number of "group by" fields that are passed to the monitor as context labels.

    Here is an exmaple of an SQL query that can be used for a monitor

    In this query the threshold field is a ratio between some value measured on the last week and in the last 10 minutes.

    tenantID and env are the group by labels that are passed to the monitor as context labels.


    Here is another query example (check the percentage of errors in a set of logs):

    A single numeric value is calculated and grouped by cluster, namespace and workload

    Applying the SQL query as a monitor

    Applying an SQL query can only happens in YAML mode. You can use the following YAML template to add your query

    1. Give your monitor a name and a description

    2. Paste your SQL query in the expression field

    3. Set the threshold value and the relevant operator - in this example this is "lower than" 0.5 (< 0.5)

    4. Set your workflow name in the annotations section

    Trigger an alert when no logs are coming from a Linux host

    Use to add the following template to your monitor.

    In this example we are creating a list of Linux hosts that were sending logs in the last 24 hours and then checking if there were any logs collected from those hosts in the last 5 minutes.

    This monitor can be used e.g. to catch when the host is down.

    It would be helpful if you add an indication on the monitor name that this is SQL based. For example, add an [SQL] prefix or suffix to the monitor name as shown in the example

    Synthetics (Beta)

    Synthetics allow you to proactively monitor the health, availability, and performance of your endpoints. By simulating requests from your infrastructure, you can verify that your critical services are reachable and returning correct data, even when no real user traffic is active.

    Overview

    groundcover Synthetics execute checks from your installed groundcover backend, working on inCloud deployments only.

    • Source: Checks run from within your backend, when having multiple groundcover backends you can select the specific backend to use. We will support region selection for running tests from specific locations.

    • Supported Protocols: Currently, Synthetics supports HTTP/HTTPS tests. Support for additional protocols, including gRPC, ICMP (Ping), DNS, and dedicated SSL monitors are coming soon.

    • Alerting: Creating a Synthetic test automatically creates a corresponding Monitor (See: ). Using monitors you can get alerted on failed synthetic tests, see: . The monitor is uneditable.

    • Trace Integration: We generate traces for all synthetic tests, which you can see as first-class citizens in groundcover platform. You can query these traces by using source:synthetics in traces page.

    Creating a Synthetic Test

    Navigate to Monitors > Synthetics and click + Create Synthetic Test .

    Only Editors can create/edit/delete synthetic tests, see

    Request Configuration

    Define the endpoint and parameters for the test.

    • Synthetic Test Name: A descriptive name for the test.

    • Target: Select the method (GET, POST, etc.) and URL. Include HTTP scheme as well, for example: https://api.groundcover.com/api/backends/list

      • Tip: Use Import from cURL to paste a command and auto-fill these fields.

    Assertions (Validation Logic)

    Assertions are the rules that determine if a test passed or failed. You can add multiple assertions to a single test. If any assertion fails, the entire check is marked as failed.

    Available Assertion Fields

    The "Field" determines which part of the response groundcover inspects.

    Assertion Operators

    The "Operator" defines the logic applied to the Field.

    Custom Labels

    Add custom labels, these labels will exist on traces generated by checks. You can use these labels to filter traces.

    Auto-Generated Monitors

    When you create a Synthetic Test, groundcover eliminates the need to manually configure separate alert rules. A Monitor is automatically generated and permanently bound to your test. See: .

    • Managed Logic: The monitor's threshold and conditions are derived directly from your Synthetic Test's assertions. If the test fails (e.g., status code != 200), the Monitor automatically enters a "Failing" state.

    • Lifecycle: This monitor handles the lifecycle of the alert, transitioning between Pending, Firing (when the test fails), and Resolved (when the test passes).

    Note: To prevent configuration drift, these auto-generated monitors are read-only. You cannot edit their query logic directly; you simply edit the Synthetic Test itself.

    Create a Grafana alert

    Alerts in groundcover leverage a fully integrated Grafana interface. To learn how you can create alerts using Grafana Terraform, follow this guide.

    Setup an alert based on metrics

    Setting up an alert in groundcover involves defining conditions based on data collected in the platform, such as metrics, traces, logs, or Kubernetes events. This guide will walk you through the process of creating an alert based on metrics. More guides will follow to include all different types of data.

    Step 1: Access the Alerts section

    1. to groundcover and navigate to the Alerts section by clicking on it on the left navigation menu.

    2. Once in the Alerts section, click on Alerting in the inner menu on the left.

      • If you can't see the inner menu, click on the 3 bars next to "Home" in the upper left corner.

    3. Click on Alert Rules

    Step 2: Give the alert a name and define query and conditions

    1. Type a name for your alert. It's recommended to use a name that will make it easy for you to understand its function later.

    2. Select the data source:

      1. ClickHouse: For alerts based on your traces, logs, and Kubernetes events.

      2. Prometheus: For alerts based on metrics (includes APM metrics, infrastructure metrics, and custom metrics from your environment)

    Note: You can click on "Run queries" to see the results of this query.

    Step 3: Define expressions - Reduce & Threshold

    1. In the Reduce section, open on the "Function" dropdown menu and choose the type of value you want to use.

      • Min - the lowest value

      • Max - the highest value

      • Mean - the average of the values

    Step 4: Set evaluation behavior

    1. Click on "+ New folder" and type a name for the folder in which this rule will be stored. You can choose any name, but it's recommended to use a name that will make it easy for you to find the relevant evaluation groups, should you want to use them again in future alerts.

    2. Click on "+ New evaluation group" and type a name for this evaluation group. The same recommendation applies here too.

      In the Evaluation interval textbox, type how often the rule should be evaluated to see if it matches the conditions set in Step 3. Then, click "Create". Note: For the Evaluation interval, use the format (number)(unit), where units are:

      • s = seconds

    Evaluation interval = how often do you want to check if the alert should fire

    Pending period = how long do you want this to be true before it fires

    As an example, you can define the alert to fire only if the Mean percentage of memory used by a node is above 90% in the past 2 minutes (Pending period = 2m) and you want to check if that's true every 30 seconds (Evaluation interval = 30s).

    Step 5: Choose contact point

    If you already have a contact point set up, simply select it from the dropdown menu at the bottom of the "Configure lables and notifications" section. If not, click on the blue "View or create contact points" link, which will open a new tab.

    Click on the blue "Add contact point" button

    This will get you to the Contact points screen. Then:

    1. Type a name for the contact point

    2. From the dropdown menu, choose which system you want to use to push the alert to.

    3. The information required to push the alert will change based on the system you select. Follow on-screen instructions (for example, if email is selected, you'll need to enter the email address(es) for that contact.

    4. Click "Save contact point"

    You can now close this tab to go back to the alert rule screen.

    Next to the link you clicked to create this new contact point, you'll find a dropdown menu, where you can select the contact point you just created.

    Step 6: Add annotations

    Under "Add annotations", you have two free text boxes that give you the option to add any information that can be useful to you and/or the recipient(s) of this alert, such as a summary that reminds you of the alert's functionality or purpose, or next step instructions when this alert fires.

    Step 7: Save and exit

    Once all of it is ready, you can click the blue "Save rule and exit" button on the upper right of the screen, which will bring you back to the Alert rules screen. You will now be able to see your alert, as well as its status - normal (green), pending (yellow), or firing (red), as well as the Evaluation interval (blue).

    Configuring Alert from existing dashboard:

    1. Log in to your groundcover account and navigate to the dashboard that you want to create an alert from.

    2. Locate the Grafana panel that you want to create an alert from and click on the panel's header and select edit .

    3. Click on the alert tab as seen in the image below. Select the Manage alerts option from the dropdown menu.

    4. Click on the New Alert Rule button.

    Note: only time series panels support alert creation.

    1. An alert is derived from three parts that will be configured in the screen that you are navigated to:

      • Expression - the query that defines the alert input itself,

      • Reduction - the value that should be leveraged from the aforementioned expression

      • Threshold

    1. Select folder - if needed you can navigate to dashboard tab in left nav and create new folder

    2. Select evaluation ground or type text in order to create a new group as shown below

    1. Click "Save and Exit" on top right hand side of screen to create alert

    2. Ensure your notification is configured to have alerts sent to end users. See "Configuring Slack Contact Point" section below if needed.

    Note: Make sure to test the alert to ensure that it is working as expected. You can do this by triggering the conditions that you defined and verifying that the alert is sent to the specified notification channels.

    List Workflows

    Get a list of all configured alert workflows with their complete definitions, provider integrations, execution status, and YAML configurations.

    Endpoint

    POST /api/workflows/list

    Search & Filter

    Search and filter

    To help you slice and dice your data, you can use our dynamic filters (left panel) and/or our powerful querying capabilities:

    1. Query Builder - Supports key:value pairs, as well as free text search. The Query Builder works in tandem with our filters.

    Authorization: Bearer <YOUR_API_KEY>
    Content-Type: application/json
    {
      "name": "string",
      "type": "sensor|thirdParty|rum"
    }
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/rbac/ingestion-keys/create' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --data '{
        "name": "production-k8s-sensor",
        "type": "sensor"
      }'
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/rbac/ingestion-keys/create' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --data '{
        "name": "otel-collector-prod",
        "type": "thirdParty",
        "tags": ["otel", "production"]
      }'
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/rbac/ingestion-keys/create' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --data '{
        "name": "frontend-rum-monitoring",
        "type": "rum",
        "tags": ["rum", "frontend", "web"]
      }'
    {
      "id": "string",
      "name": "string", 
      "createdBy": "string",
      "creationDate": "string",
      "key": "string",
      "type": "string",
      "tags": ["string"]
    }
    {
      "id": "12345678-1234-1234-1234-123456789abc",
      "name": "production-k8s-sensor",
      "createdBy": "[email protected]",
      "creationDate": "2025-08-31T14:09:15Z",
      "key": "gcik_AEBAAAE4_XXXXXXXXX_XXXXXXXXX_XXXXXXXX",
      "type": "sensor",
      "remoteConfig": true,
      "tags": []
    }
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/rbac/ingestion-keys/list' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --data '{
        "name": "production-k8s-sensor"
      }'
    Authorization: Bearer <YOUR_API_KEY>
    Content-Type: application/json
    {
      "name": "string",
      "type": "sensor|thirdParty|rum",
      "remoteConfig": boolean
    }
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/rbac/ingestion-keys/list' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --data '{}'
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/rbac/ingestion-keys/list' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --data '{
        "type": "sensor"
      }'
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/rbac/ingestion-keys/list' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --data '{
        "name": "my-sensor-key",
        "remoteConfig": true
      }'
    [
      {
        "id": "12345678-1234-1234-1234-123456789abc",
        "name": "production-sensor-key",
        "createdBy": "[email protected]",
        "creationDate": "2025-08-31T11:48:18Z",
        "key": "gcik_AEBAAAD4_XXXXXXXXX_XXXXXXXXX_XXXXXXXX",
        "type": "sensor",
        "remoteConfig": true,
        "tags": []
      },
      {
        "id": "87654321-4321-4321-4321-987654321def",
        "name": "my-sensor-key",
        "createdBy": "[email protected]",
        "creationDate": "2025-08-31T11:48:18Z",
        "key": "gcik_AEBAAAC7_XXXXXXXXX_XXXXXXXXX_XXXXXXXX",
        "type": "sensor",
        "remoteConfig": true,
        "tags": []
      },
      {
        "id": "abcdefab-cdef-abcd-efab-cdefabcdefab",
        "name": "third-party-integration",
        "createdBy": "[email protected]",
        "creationDate": "2025-08-31T11:48:18Z",
        "key": "gcik_AEBAAAHP_XXXXXXXXX_XXXXXXXXX_XXXXXXXX",
        "type": "thirdParty",
        "remoteConfig": false,
        "tags": []
      }
    ]
    curl -L \
      --url 'https://api.groundcover.com/api/monitors/xxxx-xxxx-xxx-xxxx-xxxx' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Accept: application/json'
    title: 'PVC usage above threshold (90%)'
    display:
      header: PV usage above 90% threshold - {{ alert.labels.cluster }}, {{ alert.labels.name }}
      contextHeaderLabels:
      - cluster
      - namespace
      - env
    severity: S2
    measurementType: state
    model:
      queries:
      - dataType: metrics
        name: threshold_input_query
        pipeline:
          function:
            name: last_over_time
            pipelines:
            - function:
                name: avg_by
                pipelines:
                - metric: groundcover_pvc_usage_percent
                args:
                - cluster
                - env
                - name
                - namespace
            args:
            - 1m
        conditions:
        - key: name
          origin: root
          type: string
          filters:
          - op: not_match
            value: object-storage-cache-groundcover-incloud-clickhouse-shard.*
      thresholds:
      - name: threshold_1
        inputName: threshold_input_query
        operator: gt
        values:
        - 90
    noDataState: OK
    evaluationInterval:
      interval: 1m0s
      pendingFor: 1m0s

    Labels

    Preview of available labels (click a row to see all labels)

    Array of tags to associate with the key

    Filter by remote configuration status

    key

    string

    The actual ingestion key (starts with gcik_)

    type

    string

    Key type ("sensor", "thirdParty", "rum")

    remoteConfig

    boolean

    Whether remote configuration is enabled

    tags

    array

    Array of tags associated with the key

    display.description

    string

    Monitor description

    severity

    string

    Alert severity level (e.g., "S1", "S2", "S3")

    measurementType

    string

    Type of measurement ("state", "event")

    model.queries

    array

    Query configurations for data retrieval

    model.thresholds

    array

    Threshold configurations for alerting

    executionErrorState

    string

    State when execution fails ("OK", "ALERTING")

    noDataState

    string

    State when no data is available ("OK", "ALERTING")

    evaluationInterval.interval

    string

    How often to evaluate the monitor

    evaluationInterval.pendingFor

    string

    How long to wait before alerting

    isPaused

    boolean

    Whether the monitor is currently paused

    enabling the groundcover Java agent
  • All query parameters

  • All bodies - for both the request and response

  • CPU and Memory utilization of the service and the node it is scheduled on

    supported
    integrations page
    stream processing
    here
    OpenTelemetry
    Datadog
    Golden Signals
    OTel Attributes Documentation
    Learn more about how to use our search syntaxes
    Vector transforms.
    Learn more about how to configure traces pipelines
    Read here
    Configuring which protocols should be traced
    Configuring obfuscation for sensitive payload data
    Configuring the sampling mechanism
    HTTP Settings
    • Follow redirects: Should the test follow 3xx responses, when disabled the test will return the 3xx response as the result set for assertions.

    • Allow insecure: Disables SSL/TLS certificate verification. Use this only for internal testing or self-signed certificates. Not recommended for production endpoints as it exposes you to Man-in-the-Middle attacks.

    • HTTP Version: Select the protocol version: one of HTTP/1.0, HTTP/1.1 or HTTP/2.0 . Default is HTTP/1.1

  • Timing

    • Interval: Frequency of the check (e.g., every 60s).

    • Timeout: Max duration to wait before marking the test as failed. Timeout must be less than interval.

  • Payload: Select the body type if your request requires data (e.g., POST/PUT).

    • Options: None, JSON, Text, Raw.

  • Headers & Auth:

    • Authentication (Bearer tokens, API keys) will be released soon.

    • Headers: You can add custom headers passing key and values.

  • Verifies the end of a string.

    "success"

    matches regex

    Validates against a Regular Expression.

    jsonBody matches regex user_id: \d+

    exists

    Checks that a field or header is present, regardless of value.

    set-cookie exists in response headers

    does not exist

    Checks that a field is absent.

    jsonBody (error_message) does not exist

    is one of

    Checks against a list of acceptable values.

    statusCode is one of 200, 201, 202

    Zero Maintenance: You do not need to edit this monitor's query. Any changes you make to the Synthetic Test (such as changing the target URL or assertions) are automatically synced to the Monitor.

    Field

    Description

    statusCode

    Checks the HTTP response code (e.g., 200, 404, 500).

    responseHeader

    Checks for the presence or value of a specific response header (e.g., Content-Type).

    jsonBody

    Inspects specific keys or values within a JSON response payload.

    body

    Checks the raw text body of the response.

    responseTime

    Checks the response time of the response

    jsonBody

    Inspects specific keys or values within a JSON response payload.

    Operator

    Function

    Example Use Case

    is equal to

    Exact match. Case-sensitive.

    statusCode is equal to 200

    is not equal to

    Ensures a value does not appear.

    statusCode is not equal to 500

    contains

    Checks if a substring exists within the target.

    body contains "error"

    starts with

    Verifies the beginning of a string.

    "status"

    Monitors
    Notification Channels
    Monitors
    Default Policies

    ends with

    .”

    Set the check interval and the pending time

  • Save the monitor

  • Grafana Explore
    YAML mode
    SELECT * FROM logs LIMIT 10;
    with engineStatusLastWeek as (
      select string_attributes['tenantID'] tenantID, , string_attributes['env'] env, max(float_attributes['engineStatus.numCylinders']) cylinders
      from logs
      where timestamp >= now() - interval 7 days
        and workload = 'engine-processing'
        and string_attributes['tenantID'] != ''
      group by tenantID, env
    ),
    engineStatusNow as (
      select string_attributes['tenantID'] tenantID, string_attributes['env'] env, min(float_attributes['engineStatus.numCylinders']) cylinders
      from logs
      where timestamp >= now() - interval 10 minutes
        and workload = 'engine-processing'
        and string_attributes['tenantID'] != ''
      group by tenantID, env
    )
    select n.tenantID, n.env, n.cylinders/lw.cylinders AS threshold
    from engineStatusNow n
    left join engineStatusLastWeek lw using (tenantID)
    where n.cylinders/lw.cylinders <= 0.5
    SELECT cluster, namespace, workload, 
        round( 100.0 * countIf(level = 'error') / 
        nullIf(count(), 0), 2 ) AS error_ratio_pct 
    FROM "groundcover"."logs" 
    WHERE timestamp >= now() - interval '10 minute' AND 
    namespace IN ('refurbished', 'interface') GROUP BY cluster, namespace, workload
    title: "[SQL] Monitor name"
    display:
      header: Monitor description
    severity: S2
    measurementType: event
    model:
      queries:
        - name: threshold_input_query
          expression: "[YOUR SQL QUERY GOES HERE]"
          datasourceType: clickhouse
          queryType: instant
      thresholds:
        - name: threshold_1
          inputName: threshold_input_query
          operator: lt
          values:
            - 0.5
    annotations:
      [Workflow Name]: enabled
    executionErrorState: OK
    noDataState: OK
    evaluationInterval:
      interval: 3m
      pendingFor: 2m
    isPaused: false
    
    title: Host not sending logs more than 5 minutes
    display:
      header: Host "{{host}}" is not sending logs for more than 5 minutes
    severity: S2
    measurementType: event
    model:
      queries:
        - name: threshold_input_query
          expression: "
          WITH
              (
              SELECT groupArray(DISTINCT host)
              FROM logs
              WHERE timestamp >= now() - INTERVAL 24 HOUR
              AND env_type = 'host'
              ) AS all_hosts
          SELECT
              host,
              coalesce(log_count, 0) AS log_count
          FROM
              (
              SELECT arrayJoin(all_hosts) AS host
              ) AS h
              LEFT JOIN
                  (
                  SELECT host, count(*) AS log_count
                  FROM logs
                  WHERE timestamp >= now() - INTERVAL 5 MINUTE
                  AND env_type = 'host'
                  GROUP BY host
                  ) AS l
          USING (host)
          ORDER BY host
          "
          datasourceType: clickhouse
          queryType: instant
      thresholds:
        - name: threshold_1
          inputName: threshold_input_query
          operator: lt
          values:
            - 10
    annotations:
      {Put Your Workflow Name Here}: enabled
    executionErrorState: Error
    noDataState: NoData
    evaluationInterval:
      interval: 5m
      pendingFor: 0s
    isPaused: false

    Then click on the blue "+ New alert rule" button in the upper right.

  • Click on "Select metric"

    • Note: Make sure you are in "Builder" view (see screenshot) to see this option.

  • Click on "Metrics explorer"

  • Start typing the name of the metric you want this alert to be based on. Note that the Metrics explorer will start displaying matches as you type, so you can find your metric even if you don't remember it exact name. You can also check out our list of Metrics & Labels.

  • Once you see your metric in the list, click on "Select" in that row.

  • Sum - the sum of all values

  • Count - the number of values in the result

  • Last - the last value

  • In the Threshold section, type a value and choose whether you want the alert to fire when the query result is above or below that value. You can also select a range of values.

  • m = minutes

  • h = hours

  • d = days

  • w = weeks\

  • In the Pending period box, type how often you want the alert to match the conditions before it fires.

  • - value to measure against said reduciton output to see if an alert should be triggered
  • Verify expression value and enter reduction and threshold values in line with your alerting expectation

  • Log in
    Create Alert button referenced in point 4 below
    Expression, reduction, and threshold entry screen
    Authentication

    This endpoint requires API Key authentication via the Authorization header.

    Headers

    Header
    Required
    Description

    Authorization

    Yes

    Bearer token with your API key

    Content-Type

    Yes

    Must be application/json

    Accept

    Yes

    Must be application/json

    Request Body

    This endpoint does not require a request body for the POST method.

    Field Descriptions

    Field
    Type
    Description

    workflows

    array

    Array of workflow objects

    id

    string

    Unique workflow identifier (UUID)

    name

    string

    Workflow name

    description

    string

    Workflow description

    Examples

    Basic Request

    Get all workflows:

    Response Example

    Advanced Query - Currently available only for our Logs section, enables more complex queries, included nested condition support and explicit use of a variety of operators.

    To further focus your results, you can also restrict the results to specific time windows using the time picker on the upper right of the screen.

    Query Builder

    The Query Builder is the default search option wherever search is available. Supporting advanced autocomplete of keys, values, and our discovery mode that across values in your data to teach users the data model.

    The following syntaxes are available for you to use in Query Builder:

    Syntax
    Description
    Examples
    Sections

    key:value

    Search attributes:

    Both groundcover built-ins custom attributes.

    Use * for wildcard search. Note: Multiple filters for the same key act as 'OR' conditions, whereas multiple filters for different keys act as 'AND' conditions.

    namespace:prod-us namespace:prod-*

    Logs Traces K8s Events API Catalog Issues

    term

    Free text: Search for single-word terms. Tip: Expand your search results by using wildcards.

    Exception DivisionBy*

    Logs

    "term"

    Phrase Search (case-insensitive): Enclose terms within double quotes to find results containing the exact phrase. Note: Using double quotes does not work with * wildcards.

    "search term"

    How to use filters

    Filters are very easy to add and remove, using the filters menu on the left bar. You can combine filters with the Query Builder, and filters applied using the left menu will also be added to the Query Builder in text format.

    • Select / deselect a single filter - click on the checkbox on the left of the filter. (You can also deselect a filter by clicking the 'x' next to the text format of the filter on the search bar).

    • Deselect all but one filter (within a filter category, such as 'Level' or 'Format') - hover over the filter you want to leave on, then click on "ONLY".

      • You can switch between filters you want to leave on by hovering on another filter and clicking "ONLY" again.

      • To turn all other filters in that filter category back on, hover over the filter again and click "ALL".

    • Clear all filters within a filters category - click on the funnel icon next to the category name.

    • Clear all filters currently applied - click on the funnel icon next to the number of results.

    Advanced Query

    Advanced Query is currently available only in the Logs section.

    Filters are not available in Advanced Query mode.

    The following syntaxes are available for you to use in Advanced Query:

    Syntax
    Description
    Examples
    Sections

    key:value

    Filters: Use golden filters to narrow down your search. Note: Multiple filters for the same key act as 'OR' conditions, whereas multiple filters for different keys act as 'AND' conditions.

    level:error

    Logs

    @key:value

    Attributes: Search within the content of attributes. Note: Multiple filters for the same key act as 'OR' conditions, whereas multiple filters for different keys act as 'AND' conditions.

    @transaction.id:123

    Logs

    term

    Free text (exact match): Search for single-word terms. Tip: Expand your search results by using wildcards.

    term

    Additional examples of how to use Advanced Query mode:

    Find all logs with level 'error' or 'warning', in 'json' or 'logfmt' format, where the status code is 500 or 503, the request path contains '/api/v1/', and exclude logs where the user agent is 'vmagent' or 'curl':

    Find logs where the bytes transferred are greater than 10000, the request method is POST, the host is not '10.1.11.65', and the namespace is 'production' or 'staging':

    Find logs from pods starting with 'backend-' in 'cluster-prod', where the level is 'error', the status code is not 200 or 204, and the request protocol is 'HTTP/2.0':

    Find logs where the 'user_agent' field is empty or does not exist, the request path starts with '/admin', and the status code is greater than 400:

    Find logs in 'json' format from hosts starting with 'ip-10-1-', where the level is 'unknown', the container name contains 'redis', excluding logs with bytes transferred equal to 0:

    Find logs where the time is '18/Sep/2024:07:25:46 +0000', the request method is GET, the status code is less than 200 or greater than 299, and the host is '10.1.11.65':

    Find logs where the level is 'info', the format is 'clf', the namespace is 'production', the pod name contains 'web', and exclude logs where the user agent is 'vmagent':

    Find logs where the container name does not exist, the cluster is 'cluster-prod', the request path starts with '/internal', and the request protocol is 'HTTP/1.1':

    Find logs where the bytes transferred are greater than 5000, the request method is PUT or DELETE, the status code is 403 or 404, and the host is not '10.1.11.65':

    Find logs where the format is 'unknown', the level is not 'error', the user agent is 'curl', and the pod name starts with 'test-':

    Switching between Query Builder and Advanced Query modes

    By default, the search bar will be displayed in Query Builder mode. Use the button on the right of the search bar to switch back and forth between the Query Builder and Advanced Query.

    Switch to Advanced Query mode
    Switch to Query Builder mode

    Application Metrics

    Our metrics philosophy

    The groundcover platform generates 100% of its metrics from the actual data. There are no sample rates or complex interpolations to make up for partial coverage. Our measurements represent the real, complete flow of data in your environment.

    allows us to construct the majority of the metrics on the very node where the raw transactions are recorded. This means the raw data is turned into numbers the moment it becomes possible - removing the need for storing or sending it elsewhere.

    Metrics are stored in groundcover's victoria-metrics deployment, ensuring top-notch performance on every scale.

    LLM Observability

    Overview

    LLM Observability is the practice of monitoring, analyzing, and troubleshooting interactions with Large Language Models (LLMs) across distributed systems. It focuses on capturing data regarding prompt content, response quality, performance latency, and token costs.

    groundcover provides a unified view of your GenAI traffic by combining two powerful data collection methods: zero-instrumentation eBPF tracing and native OpenTelemetry ingestion.

    Create a new Monitor

    Learn how to create and configure monitors using the Wizard, Monitor Catalog, or Import options. The following guide will help you set up queries, thresholds, and alert routing for effective monitoring.

    You can either create monitors using our web application following this guide, or use our API, see: or use our Terraform provider, see: .

    In the Monitors section (left navigation bar), navigate to the Issues page or the Monitor List page to create a new Monitor. Click on the “Create Monitor” button at the top right and select one of the following options from the dropdown:

    List Nodes with Resource Information

    Endpoint

    POST /api/k8s/v2/nodes/info-with-resources

    Authentication

    List Clusters

    Retrieve a list of Kubernetes clusters with their resource usage metrics, metadata, and health information.

    Endpoint

    Authentication

    This endpoint requires API Key authentication via the Authorization header.

    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/workflows/list' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Accept: */*'
    {
      "workflows": [
        {
          "id": "12345678-1234-1234-1234-123456789abc",
          "name": "ms-teams-alerts-workflow",
          "description": "Sends an API to MS Teams alerts endpoint",
          "created_by": "[email protected]",
          "creation_time": "2025-07-02T09:42:13.334103Z",
          "triggers": [
            {
              "type": "alert"
            }
          ],
          "interval": 0,
          "last_execution_time": null,
          "last_execution_status": null,
          "providers": [
            {
              "type": "webhook",
              "id": "provider123456789abcdef",
              "name": "teams-integration",
              "installed": true
            },
            {
              "type": "webhook",
              "id": null,
              "name": "backup-teams-integration",
              "installed": false
            }
          ],
          "workflow_raw_id": "teams-webhook",
          "workflow_raw": "id: teams-webhook\ndescription: Sends an API to MS Teams alerts endpoint\ntriggers:\n- type: alert\n  filters:\n  - key: annotations.ms-teams-alerts-workflow\n    value: enabled\nname: ms-teams-alerts-workflow\n...",
          "revision": 11,
          "last_updated": "2025-07-03T08:57:09.881806Z",
          "invalid": false,
          "last_execution_started": null
        },
        {
          "id": "87654321-4321-4321-4321-987654321def",
          "name": "webhook-alerts-workflow",
          "description": "Workflow for sending alerts to custom webhook",
          "created_by": "[email protected]",
          "creation_time": "2025-06-19T12:49:37.630392Z",
          "triggers": [
            {
              "type": "alert"
            }
          ],
          "interval": 0,
          "last_execution_time": null,
          "last_execution_status": null,
          "providers": [
            {
              "type": "webhook",
              "id": "webhook987654321fedcba",
              "name": "custom-webhook",
              "installed": true
            }
          ],
          "workflow_raw_id": "webhook-alerts",
          "workflow_raw": "id: webhook-alerts\ndescription: Workflow for sending alerts to custom webhook\n...",
          "revision": 2,
          "last_updated": "2025-06-19T12:51:24.643393Z",
          "invalid": false,
          "last_execution_started": null
        }
      ]
    }
    level:(error or warning) format:(json or logfmt) status_code:(500 or 503) @request.path:~"/api/v1/" NOT user_agent:(vmagent or curl)
    bytes:>10000 @request.method:POST NOT host:10.1.11.65 namespace:(production or staging)
    pod:~backend- cluster:cluster-prod level:error NOT status_code:(200 or 204) @request.protocol:"HTTP/2.0"
    user_agent:"" @request.path:~"/admin" status_code:>400
    format:json host:~"ip-10-1-" level:unknown container:~redis NOT bytes:0
    @time:"18/Sep/2024:07:25:46 +0000" @request.method:GET (status_code:<200 status_code:>299) host:10.1.11.65
    level:info format:clf namespace:production pod:~web NOT user_agent:vmagent
    container:"" cluster:cluster-prod @request.path:~"/internal" @request.protocol:"HTTP/1.1"
    bytes:>5000 @request.method:(PUT or DELETE) status_code:(403 or 404) NOT host:10.1.11.65
    format:unknown NOT level:error user_agent:curl pod:~test-

    created_by

    string

    Email of the workflow creator

    creation_time

    string

    Workflow creation timestamp (ISO 8601)

    triggers

    array

    Array of trigger configurations

    triggers[].type

    string

    Trigger type (e.g., "alert")

    interval

    number

    Execution interval (typically 0 for alert-triggered workflows)

    last_execution_time

    string/null

    Last execution timestamp

    last_execution_status

    string/null

    Last execution status ("success", "error", etc.)

    providers

    array

    Array of integration provider configurations

    providers[].type

    string

    Provider type (see provider types below)

    providers[].id

    string/null

    Provider configuration ID

    providers[].name

    string

    Provider display name

    providers[].installed

    boolean

    Whether provider is installed and configured

    workflow_raw_id

    string

    Raw workflow identifier

    workflow_raw

    string

    Complete YAML workflow definition

    revision

    number

    Workflow version number

    last_updated

    string

    Last update timestamp (ISO 8601)

    invalid

    boolean

    Whether workflow configuration is invalid

    last_execution_started

    string/null

    When last execution started

    Logs

    -key:value

    Exclude: Specify terms or filters to omit from your search; applies to each distinct search.

    -key:value -term -"search term"

    Logs Traces K8s Events API Catalog Issues

    *:value

    Search all attributes:

    Search any attribute for a value, you can use double quotes for exact match and wildcards.

    *:error *:"POST /api/search" *:erro*

    Logs Traces Issues

    Logs

    " "

    Phrase Search (case-insensitive): Enclose terms within double quotes to find results containing the exact phrase.

    "search term"

    Logs

    ~

    Wildcard: Search for partial matches. Note: Wildcards must be added before the search term or value, and will always be treated as a partial match search.

    key:~val

    @key:~val

    ~term

    ~"search phrase"

    Logs

    NOT !

    Exclude: Specify terms or filters to omit from your search; applies to each distinct search.

    !key:value NOT @key:value NOT term !"search term"

    Logs

    key:""

    Identify cases where key does not exist or is empty

    pid:""

    Logs

    key:=# key:># key:<#

    Search for key:pair values where the value is equal, greater than, or smaller than, a specified number.

    threadPriority:>5

    Logs

    key:(val1 or val2)

    Search for key:value pairs using a list of values.

    level:(error or info)

    Logs

    query1 or query2

    Use OR operator to display matches on either queries

    level:error or format:json

    Logs

    query1 and query2

    Use AND operator to display matches on both queries

    level:error and format:json

    Logs

    "Search term prefix"*

    Exact phrase prefix search

    "Error 1064 (42"*

    Logs

    Golden signals

    In the world of excessive data, it's important to have a rule of thumb for knowing where to start looking. For application metrics, we rely on our golden signals.

    The following metrics are generated for each resource being aggregated:

    • Requests per second (RPS)

    • Errors rate

    • Latencies (p50 and p95)

    The golden signals are then displayed in two important ways: Workload and Resource aggregations.

    See below for the full list of generated workload and resource golden metrics.

    Resource aggregations are highly granularity metrics, providing insights into individual APIs.

    Workload aggregations are designed to show an overview of each service, enabling a higher level inspection. These are constructed using all of the resources recorded for each service.

    Controlling retention

    groundcover allows full control over the retention of your metrics. Learn more here.

    List of available metrics

    Below you will find the full list of our APM metrics, as well as the labels we export for each. These labels are designed with high granularity in mind for maximal insight depth. All of the metrics listed are available out of the box after installing groundcover, without any further setup.

    We fully support the ingestion of custom metrics to further expand the visibility into your environment.

    We also allow for building custom dashboards, enabling full freedom in deciding how to display your metrics - building on groundcover's metrics below plus every custom metric ingested.

    Our labels

    Label name
    Description
    Relevant types

    clusterId

    Name identifier of the K8s cluster

    region

    Cloud provider region name

    namespace

    K8s namespace

    workload_name

    K8s workload (or service) name

    Summary based metrics have an additional quantile label, representing the percentile. Available values: [”0.5”, “0.95”, 0.99”].

    groundcover uses a set of internal labels which are not relevant in most use-cases. Find them interesting? Let us know over Slack!

    issue_id entity_id resource_id query_id aggregation_id parent_entity_id perspective_entity_id perspective_entity_is_external perspective_entity_issue_id perspective_entity_name perspective_entity_namespace perspective_entity_resource_id

    Golden Signals Metrics

    In the lists below, we describe error and issue counters. Every issue flagged by groundcover is an error; but not every error is flagged as an issue.

    Resource metrics

    Name
    Description
    Type

    groundcover_resource_total_counter

    total amount of resource requests

    Counter

    groundcover_resource_error_counter

    total amount of requests with error status codes

    Counter

    groundcover_resource_issue_counter

    total amount of requests which were flagged as issues

    Counter

    groundcover_resource_success_counter

    total amount of resource requests with OK status codes

    Workload metrics

    Name
    Description
    Type

    groundcover_workload_total_counter

    total amount of requests handled by the workload

    Counter

    groundcover_workload_error_counter

    total amount of requests handled by the workload with error status codes

    Counter

    groundcover_workload_issue_counter

    total amount of requests handled by the workload which were flagged as issues

    Counter

    groundcover_workload_success_counter

    total amount of requests handled by the workload with OK status codes

    Storage usage metrics

    Name
    Description
    Type

    groundcover_pvc_read_bytes_total

    total amount of bytes read by the workload from the PVC

    Counter

    groundcover_pvc_write_bytes_total

    total amount of bytes written by the workload to the PVC

    Counter

    groundcover_pvc_reads_total

    total amount of read operations done by the workload from the PVC

    Counter

    groundcover_pvc_writes_total

    total amount of write operations done by the workload to the PVC

    Kafka specific metrics

    Name
    Description
    Type

    groundcover_client_offset

    client last message offset (for producer the last offset produced, for consumer the last requested offset)

    Gauge

    groundcover_workload_client_offset

    client last message offset (for producer the last offset produced, for consumer the last requested offset), aggregated by workload

    Gauge

    groundcover_calc_lagged_messages

    current lag in messages

    Gauge

    groundcover_workload_calc_lagged_messages

    current lag in messages, aggregated by workload

    Gauge
    Stream processing
    eBPF-Based Tracing - Zero Instrumentation

    groundcover automatically detects and traces LLM API calls without requiring SDKs, wrappers, or code modification.

    The sensor captures traffic at the kernel level, extracting key data points and transforming requests into structured spans and metrics. This allows for instant visibility into third-party providers without altering application code. This method captures:

    • Payloads: Full prompt and response bodies (supports redaction).

    • Usage: Token counts (input, output, total).

    • Metadata: Model versions, temperature, and parameters.

    • Performance: Latency and completion time.

    • Status: Error messages and finish reasons.

    Requirement: Out-of-the-box LLM tracing for (OpenAI and Anthropic) is available starting from sensor version 1.9.563. Bedrock available starting from sensor version 1.11.158

    OpenTelemetry Instrumentation Support

    In addition to auto-detection, groundcover supports the ingestion of traces generated by manual OpenTelemetry instrumentation.

    If your applications are already instrumented using OpenTelemetry SDKs (e.g., using the OpenTelemetry Python or JavaScript instrumentation for OpenAI/LangChain), groundcover will seamlessly ingest, process, and visualize these spans alongside your other telemetry data.

    Generative AI Span Structure

    When groundcover captures traffic via eBPF, it automatically transforms the data into structured spans that adhere to the OpenTelemetry GenAI Semantic Conventions.

    This standardization allows LLM traces to correlate with existing application telemetry. Below are the attributes captured for each eBPF-generated LLM span:

    Attribute
    Description
    Example

    gen_ai.system

    The Generative AI provider

    openai

    gen_ai.request.model

    The model name requested by the client

    gpt-4

    gen_ai.response.model

    The name of the model that generated the response

    gpt-4-0613

    gen_ai.response.usage.input_tokens

    Tokens consumed by the input (prompt)

    Generative AI Metrics

    groundcover automatically generates rate, errors, duration and usage metrics from the LLM traces. These metrics adhere to OpenTelemetry GenAI conventions and are enriched with Kubernetes context (cluster, namespace, workload, etc).

    Metric Name
    Description

    groundcover_workload_gen_ai_response_usage_input_tokens

    Input token count, aggregated by K8s workload

    groundcover_workload_gen_ai_response_usage_output_tokens

    Output token count, aggregated by K8s workload

    groundcover_workload_gen_ai_response_usage_total_tokens

    Total token usage, aggregated by K8s workload

    groundcover_gen_ai_response_usage_input_tokens

    Global input token count (cluster-wide)

    groundcover_gen_ai_response_usage_output_tokens

    Global output token count (cluster-wide)

    groundcover_gen_ai_response_usage_total_tokens

    Global total token usage (cluster-wide)

    Available Labels:

    Metrics can be filtered by: workload, namespace, cluster, gen_ai_request_model, gen_ai_system, client, server, and status_code.

    Configuration

    Obfuscation Configuration

    LLM payloads often contain sensitive data (PII, secrets). By default, groundcover collects full payloads to aid in debugging. You can configure the agent to obfuscate specific fields within the prompts or responses using the httphandler configuration in your values.yaml.

    See Sensitive data obfuscation for full details on obfuscation in groundcover.

    By default groundcover does not obfuscate LLM payloads.

    Obfuscating Request Prompts

    This configuration will obfuscate request prompts, while keeping metadata like model, tokens, etc

    Obfuscating Response Prompts

    This configuration will obfuscate response data, while keeping metadata like model, tokens, etc

    Supported Providers

    groundcover currently supports the following providers via auto-detection:

    • OpenAI (Chat Completion API)

    • Anthropic (Chat Completion API)

    • AWS Bedrock APIs

    For providers not listed above, manual OpenTelemetry instrumentation can be used to send data to groundcover.

    Using the Monitor Wizard

    Overview

    The Monitor Wizard is a guided, user-friendly approach to creating and configuring monitors tailored to your observability needs. By breaking down the process into simple steps, it ensures consistency and accuracy.

    Section 1: Information

    Set up the basic information for the monitor.

    Monitor Title (Required):

    Add a title for the monitor. The title will appear in notifications and in the Monitor List page.

    Give the Monitor a clear, short name, that describes its function at a high level.

    Examples:

    • “Workload High API Error Rate”

    • “Workload Pods High Memory”

    The title will appear in the monitors page table and be accessible in workflows and alerts.

    Description (Optional):

    Add a description for your monitor, The description will appear when viewing the monitor details, you can also use this for your alerts.

    Section 2: Query

    Select the data source, build the query and define thresholds for the monitor.

    If you're unfamiliar with query building in groundcover, refer to the Query Builder section for full details on the different components.

    • Data Source (Required):

      • Select the type of data (Metrics, Infra Metrics, Logs, Traces, or Events).

    • Query Functionality:

      • Choose how to process the data (e.g., average, count).

      • Add aggregation clauses if applicable, you MUST use aggregations if you want to add labels to your issues.

      • Examples: cluster, workload, container_name

    • Time Window (Required):

      • Specify the period over which data is aggregated.

      • Example: “Over the last 5 minutes.”

    • Threshold Conditions (Required):

      • Define when the monitor triggers. You can use:

        • Greater Than - Trigger when the value exceeds X.

        • Lower Than - Trigger when the value falls below X.

    • Visualization Type (Optional):

      • Preview data using Stacked Bar or Line Chart for better clarity while building the monitor.

    Section 3: Display

    Customize how the Monitor’s Issues will appear. This section also includes a live preview of the way it will appear in the Issues page.

    Ensure that the labels you wish to use dynamically (e.g., span_name, workload) are defined in the query configuration step (Section 2: Query)

    Issue Header (required):

    Define a name for issues that this Monitor will raise. It's useful to use labels that can include information from the query.

    For example, adding {{ alert.labels.statusCode }} to the header will inject the status code to the name of the issue - this becomes especially useful when one Monitor raises multiple issues and you want to quickly understand their content without having to open each one.

    Examples:

    • “HTTP API Error {{ alert.labels.status_code }}” -> HTTP API Error 500

    • “Workload {{ alert.labels.workload }} Pod Restart” -> Workload frontend Pod Restart

    • “{{ alert.labels.customer }} APIs High Latency” -> org.com APIs High Latenct

    If you do choose to use templated dynamic values, make sure they exist as monitor query labels.

    Severity (required):

    Use severity to categorize alerts by importance.

    Select a severity level (S1-S4).

    Context Labels (optional):

    If you want to use labels here you MUST add them to query aggregation.

    These Labels will be displayed and filterable in Monitors>Issues page.

    We recommend using up to 5 Labels for best experience.

    Section 4: Metadata Labels

    Organize and categorize monitors, you can use these to route issues using advanced workflows.

    • Labels (optional):

      • Add key-value pairs for metadata.

    Section 5: Evaluation Settings

    Define how often the monitor evaluates its conditions.

    Evaluation Interval (Required):

    Specify how often the monitor evaluates the query

    Example: “Evaluate every 1 minute.”

    Pending Period (Required):

    This ensures that transient conditions do not trigger alerts, reducing false positives. For example, setting this to 10 minutes ensures the condition must persist for at least 10 minutes before firing.

    If the query conditions were not met during the evaluation duration, the issue's pending period will reset to normal.

    Example: “Wait for 10 minute before alerting."

    Section 6: Routing

    Set up how issues from this monitor will be routed.

    Select Workflow (Optional):

    Route alerts to existing workflows only, this means that other workflows will not process them. Use this to send alerts for a critical application such as Slack or PagerDuty.

    No Routing (Optional):

    This means that any workflow (without filters), will process the issue.

    Quick tips to create effective Monitors

    Use the Monitor Catalog as much as you can

    Whenever possible, use our carefully crafted monitors from the Monitor Catalog. This will save you time, ensure the Monitors are built effectively, and help you align your alerting strategy with best practices. If you can't find one that perfectly matches your needs, use them as your starting point and edit their properties to customize them to your needs.

    Give the Monitor a short and clear title

    Give the Monitor a clear, short name, that describes its function at a high level.

    Examples:

    • “Workload High API Error Rate”

    • “Workload Pods High Memory”

    The title will appear in the monitors page table and be accessible in workflows and alerts.

    Use a Descriptive Issue Header

    Choose a clear name for the Issue header, offering a bit more details and going into a more specific description of the monitor name. A Header is a specific property of an issue, so you can add templated dynamic values here. For example, you can use dynamic label values in the header name.

    Examples:

    • “HTTP API Error {{ alert.labels.status_code }}”,

    • “Workload {{ alert.labels.workload }} Pod Restart”

    • “{{ alert.labels.customer }} APIs High Latency”.

    If you do choose to use templated dynamic values, make sure they exist as monitor query labels.

    Use up to 3 Resource Labels

    We recommend using up to 3 ResourceHeaderLabels. These labels should give your team the context of what is the subject of the issue.

    Examples:

    span_name , pod_name

    ResourceHeaderLabels appear as a secondary header in Issues tables across the platform.

    Use up to 3 Context Labels

    We recommend using up to 3 ContextHeaderLabels. These labels should give your team the context of where the issue happened.

    Examples:

    cluster, namespace , workload

    ContextHeaderLabels appear on Issues tables across platform, next to your issues.

    Using the Import option

    This is an advanced feature, please use it with caution.

    In the "Import Bulk Monitors" you can add multiple monitors using an array of Monitors that follows the Monitor YAML structure.

    Example of importing multiple monitors

    Click on "Create Monitors" to create them.

    https://github.com/groundcover-com/docs/blob/main/use-groundcover/monitors/README.md
    groundcover Terraform Provider

    This endpoint requires API key authentication.

    Headers

    Header
    Value
    Description

    Authorization

    Bearer <YOUR_API_KEY>

    Your groundcover API key

    Content-Type

    application/json

    Request body format

    X-Backend-Id

    <YOUR_BACKEND_ID>

    Your backend identifier

    Request Body

    Field
    Type
    Required
    Description

    start

    string

    Yes

    Start time in ISO 8601 UTC format

    end

    string

    Yes

    End time in ISO 8601 UTC format

    sources

    array

    No

    Sources Structure (Cluster Filter)

    Example Request

    Response

    Response Fields

    Field
    Type
    Description

    nodes

    array

    Array of node objects

    nodes[].uid

    string

    Unique identifier for the node

    nodes[].name

    string

    Node name

    nodes[].cluster

    string

    Cluster name

    Filter Operations

    Operator
    Description

    eq

    Equals

    ne

    Not equals

    gt

    Greater than

    lt

    Less than

    contains

    Contains substring

    Common Use Cases

    Get All Nodes

    Filter by Specific Cluster

    Headers
    Header
    Required
    Description

    Authorization

    Yes

    Bearer token with your API key

    X-Backend-Id

    Yes

    Your backend identifier

    Content-Type

    Yes

    Must be application/json

    Accept

    Yes

    Must be application/json

    Request Body

    Parameter
    Type
    Required
    Description

    sources

    Array

    No

    Filter by data sources (empty array for all sources)

    Response

    The response contains an array of clusters with detailed resource usage and metadata.

    Response Fields

    Field
    Type
    Description

    clusters

    Array

    Array of cluster objects

    totalCount

    Integer

    Total number of clusters

    Cluster Object Fields

    Field
    Type
    Description

    name

    String

    Cluster name

    env

    String

    Environment (e.g., "prod", "ga", "beta", "alpha", "latest")

    creationTimestamp

    String

    When the cluster was created (ISO 8601)

    cloudProvider

    String

    Cloud provider (e.g., "AWS", "GCP", "Azure")

    CPU Metrics

    Field
    Type
    Description

    cpuUsage

    Integer

    Current CPU usage in millicores

    cpuLimit

    Integer

    CPU limits set on resources in millicores

    cpuAllocatable

    Integer

    Total allocatable CPU in millicores

    cpuRequest

    Integer

    Total CPU requests in millicores

    Memory Metrics

    Field
    Type
    Description

    memoryUsage

    Integer

    Current memory usage in bytes

    memoryLimit

    Integer

    Memory limits set on resources in bytes

    memoryAllocatable

    Integer

    Total allocatable memory in bytes

    memoryRequest

    Integer

    Total memory requests in bytes

    Pod Information

    Field
    Type
    Description

    pods

    Object

    Pod counts by status (e.g., {"Running": 157, "Succeeded": 4})

    Examples

    Basic Request

    Response Example

    List Deployments

    Get a list of Kubernetes deployments with status information, replica counts, and operational conditions for a specified time range.

    Endpoint

    POST /api/k8s/v2/deployments/list

    Authentication

    This endpoint requires API Key authentication via the Authorization header.

    Headers

    Header
    Required
    Description

    Request Body

    The request body requires a time range and supports filtering by fields:

    Parameters

    Parameter
    Type
    Required
    Description

    Response

    Response Schema

    Field Descriptions

    Field
    Type
    Description

    Common Condition Types

    Type
    Description

    Common Condition Reasons

    Reason
    Description

    Examples

    Basic Request

    Get deployments for a specific time range:

    Filter by Namespace

    Get deployments from specific namespaces:

    Response Example

    Time Range Guidelines

    • Use ISO 8601 UTC format for timestamps

    • Typical time ranges: 1-24 hours for operational monitoring

    • Maximum recommended range: 7 days

    • Format: YYYY-MM-DDTHH:MM:SS.sssZ

    List Workloads

    Retrieve a list of Kubernetes workloads with their performance metrics, resource usage, and metadata.

    Endpoint

    Authentication

    This endpoint requires API Key authentication via the Authorization header.

    Headers

    Header
    Required
    Description

    Request Body

    Parameter
    Type
    Required
    Default
    Description

    Response

    The response contains a paginated list of workloads with their metrics and metadata.

    Response Fields

    Field
    Type
    Description

    Workload Object Fields

    Field
    Type
    Description

    Examples

    Basic Request

    Response Example

    Pagination

    To retrieve all workloads, use pagination by incrementing the skip parameter:

    Fetching All Results

    Pagination Logic

    To fetch all results programmatically:

    1. Start with skip=0 and limit=100 (or your preferred page size)

    2. Check the total field in the response

    3. Continue making requests, incrementing skip by your limit value

    Example calculation:

    • If total is 6314 and limit is 100

    • You need ⌈6314/100⌉ = 64 requests

    • Last request: skip=6300, limit=100 (returns 14 items)

    values.yaml
    httphandler:
      obfuscationConfig:
        keyValueConfig:
          enabled: true
          mode: "ObfuscateSpecificValues"
          specificKeys:
            - "messages"
            - "inputText"
            - "prompt"
    values.yaml
    httphandler:
      obfuscationConfig:
        keyValueConfig:
          enabled: true
          mode: "ObfuscateSpecificValues"
          specificKeys:
            - "choices"
            - "output"
            - "content"
            - "outputs"
            - "results"
            - "generation"
    monitors:
    - title: K8s Cluster High Memory Requests Monitor
      display:
        header: K8s Cluster High Memory Requests
        description: Alerts when a K8s Cluster's total Container Memory Requests exceeds 90% of the Allocatable Memory of all the Nodes for 5 minutes    
        contextHeaderLabels:
          - env
          - cluster
      severity: S1
      measurementType: state
      model:
        queries:
          - name: threshold_input_query
            expression: avg_over_time( (((sum(groundcover_node_rt_mem_requests_bytes{}) by (cluster, env)) / (sum(groundcover_node_rt_allocatable_mem_bytes{}) by (cluster, env))) * 100)[5m] )
            queryType: instant
            datasourceType: prometheus
        thresholds:
          - name: threshold_1
            inputName: threshold_input_query
            operator: gt
            values:
              - 90
      noDataState: OK
      evaluationInterval:
        interval: 1m
        pendingFor: 0s
    - title: K8s PVC Pending For 5 Minutes Monitor
      display:
        header: K8s PVC Pending Over 5 Minutes
        description: This monitor triggers an alert when a PVC remains in a Pending state for more than 5 minutes.
        contextHeaderLabels:
          - cluster
          - namespace
          - persistentvolumeclaim
      severity: S2
      measurementType: state
      model:
        queries:
          - name: threshold_input_query
            expression: last_over_time(max(groundcover_kube_persistentvolumeclaim_status_phase{phase="Pending"}) by (cluster, namespace, persistentvolumeclaim)[1m])
            queryType: instant
            datasourceType: prometheus
        thresholds:
          - name: threshold_1
            inputName: threshold_input_query
            operator: gt
            values:
              - 0
      executionErrorState: OK
      noDataState: OK
      evaluationInterval:
        interval: 1m
        pendingFor: 5m
    {
      "key": "cluster",
      "type": "string",
      "origin": "root", 
      "filters": [
        {
          "op": "eq",
          "value": "cluster-name"
        }
      ]
    }
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/k8s/v2/nodes/info-with-resources' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --header 'X-Backend-Id: <YOUR_BACKEND_ID>' \
      --data '{
        "start": "2025-01-27T12:00:00.000Z",
        "end": "2025-01-27T14:00:00.000Z",
        "sources": [
          {
            "key": "cluster",
            "type": "string",
            "origin": "root",
            "filters": [
              {
                "op": "eq",
                "value": "my-cluster"
              }
            ]
          }
        ],
        "limit": 100,
        "nameFilter": ""
      }'
    {
      "nodes": [
        {
          "uid": "node-uid",
          "name": "node-name",
          "cluster": "cluster-name",
          "env": "environment-name",
          "creationTimestamp": "2025-01-01T10:00:00Z",
          "labels": {
            "kubernetes.io/arch": "amd64",
            "kubernetes.io/os": "linux",
            "node.kubernetes.io/instance-type": "t3.medium"
          },
          "addresses": [
            {
              "type": "InternalIP",
              "address": "10.0.1.100"
            },
            {
              "type": "ExternalIP", 
              "address": "203.0.113.100"
            }
          ],
          "nodeInfo": {
            "kubeletVersion": "v1.24.0",
            "kubeProxyVersion": "v1.24.0",
            "operatingSystem": "linux",
            "architecture": "amd64",
            "containerRuntimeVersion": "containerd://1.6.0",
            "kernelVersion": "5.4.0-91-generic",
            "osImage": "Ubuntu 20.04.3 LTS"
          },
          "capacity": {
            "cpu": "2",
            "memory": "8Gi",
            "pods": "110"
          },
          "allocatable": {
            "cpu": "1940m",
            "memory": "7Gi", 
            "pods": "110"
          },
          "usage": {
            "cpu": "500m",
            "memory": "3Gi"
          },
          "ready": true,
          "conditions": [
            {
              "type": "Ready",
              "status": "True",
              "lastTransitionTime": "2025-01-01T10:05:00Z",
              "reason": "KubeletReady",
              "message": "kubelet is posting ready status"
            }
          ]
        }
      ]
    }
    {
      "start": "2025-01-27T12:00:00.000Z",
      "end": "2025-01-27T14:00:00.000Z",
      "limit": 100
    }
    {
      "start": "2025-01-27T12:00:00.000Z",
      "end": "2025-01-27T14:00:00.000Z",
      "sources": [
        {
          "key": "cluster",
          "type": "string", 
          "origin": "root",
          "filters": [{"op": "eq", "value": "production-cluster"}]
        }
      ],
      "limit": 100
    }
    POST /api/k8s/v3/clusters/list
    curl 'https://api.groundcover.com/api/k8s/v3/clusters/list' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'content-type: application/json' \
      -H 'X-Backend-Id: <YOUR_BACKEND_ID>' \
      --data-raw '{"sources":[]}'
    {
      "clusters": [
        {
          "name": "production-cluster",
          "env": "prod",
          "cpuUsage": 126640,
          "cpuLimit": 289800,
          "cpuAllocatable": 302820,
          "cpuRequest": 187975,
          "cpuUsageAllocatablePercent": 41.82,
          "cpuRequestAllocatablePercent": 62.07,
          "cpuUsageRequestPercent": 67.37,
          "cpuUsageLimitPercent": 43.70,
          "cpuLimitAllocatablePercent": 95.70,
          "memoryUsage": 242994409472,
          "memoryLimit": 604262891520,
          "memoryAllocatable": 1227361431552,
          "memoryRequest": 495549677568,
          "memoryUsageAllocatablePercent": 19.80,
          "memoryRequestAllocatablePercent": 40.38,
          "memoryUsageRequestPercent": 49.04,
          "memoryUsageLimitPercent": 40.21,
          "memoryLimitAllocatablePercent": 49.23,
          "nodesCount": 6,
          "pods": {
            "Running": 109,
            "Succeeded": 3
          },
          "issueCount": 1,
          "creationTimestamp": "2021-11-01T14:37:31Z",
          "cloudProvider": "AWS",
          "kubernetesVersion": "v1.30.14-eks-931bdca"
        }
      ],
      "totalCount": 116
    }
    POST /api/k8s/v3/workloads/list

    100

    gen_ai.response.usage.output_tokens

    Tokens generated in the response

    100

    gen_ai.response.usage.total_tokens

    Total token usage for the interaction

    200

    gen_ai.response.finish_reason

    Reason the model stopped generating

    stop ; length

    gen_ai.response.choice_count

    Target number of candidate completions

    3

    gen_ai.response.system_fingerprint

    Fingerprint to track backend environment changes

    fp_44709d6fcb

    gen_ai.response.tools_used

    Number of tools used in API call

    2

    gen_ai.request.temperature

    The temperature setting

    0.0

    gen_ai.request.max_tokens

    Maximum tokens allowed for the request

    100

    gen_ai.request.top_p

    The top_p sampling setting

    1.0

    gen_ai.request.stream

    Boolean indicating if streaming was enabled

    false

    gen_ai.response.message_id

    Unique ID of the message created by the server

    gen_ai.error.code

    The error code for the response

    gen_ai.error.message

    A human-readable description of the error

    gen_ai.error.type

    Describes a class of error the operation ended with

    timeout; java.net.UnknownHostException; server_certificate_invalid; 500

    gen_ai.operation.name

    The name of the operation being performed

    chat; generate_content; text_completion

    gen_ai.request.message_count

    Count of messages in API response

    1

    gen_ai.request.system_prompt

    Boolean flag whether system prompt was used in request prompts

    true

    gen_ai.request.tools_used

    Boolean flag whether any tools were used in requests

    true

    Source filters (e.g., cluster filters)

    limit

    integer

    No

    Maximum number of nodes to return (default: 100)

    nodes[].env

    string

    Environment name

    nodes[].creationTimestamp

    string

    Node creation time in ISO 8601 format

    nodes[].labels

    object

    Node labels key-value pairs

    nodes[].addresses

    array

    Node IP addresses (internal/external)

    nodes[].nodeInfo

    object

    Node system information

    nodes[].capacity

    object

    Total node resource capacity

    nodes[].allocatable

    object

    Allocatable resources (capacity minus system reserved)

    nodes[].usage

    object

    Current resource usage

    nodes[].ready

    boolean

    Node readiness status

    nodes[].conditions

    array

    Node condition details

    kubernetesVersion

    String

    Kubernetes version

    nodesCount

    Integer

    Number of nodes in the cluster

    issueCount

    Integer

    Number of issues detected

    cpuUsageAllocatablePercent

    Float

    CPU usage as percentage of allocatable

    cpuRequestAllocatablePercent

    Float

    CPU requests as percentage of allocatable

    cpuUsageRequestPercent

    Float

    CPU usage as percentage of requests

    cpuUsageLimitPercent

    Float

    CPU usage as percentage of limits

    cpuLimitAllocatablePercent

    Float

    CPU limits as percentage of allocatable

    memoryUsageAllocatablePercent

    Float

    Memory usage as percentage of allocatable

    memoryRequestAllocatablePercent

    Float

    Memory requests as percentage of allocatable

    memoryUsageRequestPercent

    Float

    Memory usage as percentage of requests

    memoryUsageLimitPercent

    Float

    Memory usage as percentage of limits

    memoryLimitAllocatablePercent

    Float

    Memory limits as percentage of allocatable

    pod_name

    K8s pod name

    container_name

    K8s container name

    container_image

    K8s container image name

    remote_namespace

    Remote K8s namespace (other side of the communication)

    remote_service_name

    Remote K8s service name (other side of the communication)

    remote_container_name

    Remote K8s container name (other side of the communication)

    type

    The protocol in use (HTTP, gRPC, Kafka, DNS etc.)

    role

    Role in the communication (client or server)

    clustered_path

    HTTP / gRPC aggregated resource path (e.g. /metrics/*)

    http, grpc

    method

    HTTP / gRPC method (e.g GET)

    http, grpc

    response_status_code

    Return status code of a HTTP / gPRC request (e.g. 200 in HTTP)

    http, grpc

    dialect

    SQL dialect (MySQL or PostgreSQL)

    mysql, postgresql

    response_status

    Return status code of a SQL query (e.g 42P01 for undefined table)

    mysql, postgresql

    client_type

    Kafka client type (Fetcher / Producer)

    kafka

    topic

    Kafka topic name

    kafka

    partition

    Kafka partition identifier

    kafka

    error_code

    Kafka return status code

    kafka

    query_type

    type of DNS query (e.g. AAAA)

    dns

    response_return_code

    Return status code of a DNS resolution request (e.g. Name Error)

    dns

    method_name, method_class_name

    Method code for the operation

    amqp

    response_method_name, response_method_class_name

    Method code for the operation's response

    amqp

    exit_code

    K8s container termination exit code

    container_state, container_crash

    state

    K8s container current state (Running, Waiting or Terminated)

    container_state

    state_reason

    K8s container state transition reason (e.g CrashLoopBackOff or OOMKilled)

    container_state

    crash_reason

    K8s container crash reason (e.g Error, OOMKilled)

    container_crash

    pvc_name

    K8s PVC name

    storage

    Counter

    groundcover_resource_latency_seconds

    resource latency [sec]

    Summary
    Counter

    groundcover_workload_latency_seconds

    resource latency across all of the workload APIs [sec]

    Summary
    Counter

    groundcover_pvc_read_latency

    latency of read operation by the workload from the PVC, in microseconds

    Summary

    groundcover_pvc_write_latency

    latency of write operation by the workload to the PVC, in microseconds

    Summary

    groundcover_calc_lag_seconds

    current lag in time [sec]

    Gauge

    groundcover_workload_calc_lag_seconds

    current lag in time, aggregated by workload [sec]

    Gauge

    Within Range - Trigger when the value is between X and Y.

  • Outside Range - Trigger when the value is not between X and Y.

  • Example: “Trigger if disk space usage is greater than 10%.”

  • Monitor Wizard
    Monitor Catalog
    Import

    sources

    array

    No

    Source filters

    creationTime

    string

    Deployment creation timestamp in ISO 8601 format

    cluster

    string

    Kubernetes cluster name

    env

    string

    Environment name (e.g., "prod", "staging")

    available

    integer

    Number of available replicas

    desired

    integer

    Number of desired replicas

    ready

    integer

    Number of ready replicas

    conditions

    array

    Array of deployment condition objects

    conditions[].type

    string

    Condition type (e.g., "Available", "Progressing")

    conditions[].status

    string

    Condition status ("True", "False", "Unknown")

    conditions[].lastProbeTime

    string/null

    Last time the condition was probed

    conditions[].lastHeartbeatTime

    string/null

    Last time the condition was updated

    conditions[].lastTransitionTime

    string

    Last time the condition transitioned

    conditions[].reason

    string

    Machine-readable reason for the condition

    conditions[].message

    string

    Human-readable message explaining the condition

    warnings

    array

    Array of warning messages (usually empty)

    id

    string

    Unique identifier for the deployment

    resourceVersion

    integer

    Kubernetes resource version

    Authorization

    Yes

    Bearer token with your API key

    X-Backend-Id

    Yes

    Your backend identifier

    Content-Type

    Yes

    Must be application/json

    Accept

    Yes

    Must be application/json

    start

    string

    Yes

    Start time in ISO 8601 UTC format (e.g., "2025-08-24T07:21:36.944Z")

    end

    string

    Yes

    End time in ISO 8601 UTC format (e.g., "2025-08-24T08:51:36.944Z")

    namespaces

    array

    No

    deployments

    array

    Array of deployment objects

    name

    string

    Deployment name

    namespace

    string

    Kubernetes namespace

    workloadName

    string

    Associated workload name

    Available

    Deployment has minimum availability

    Progressing

    Deployment is making progress towards desired state

    MinimumReplicasAvailable

    Deployment has minimum number of replicas available

    NewReplicaSetAvailable

    New ReplicaSet has successfully progressed

    Array of namespace names to filter by (e.g., ["groundcover", "default"])

    Integer

    No

    0

    Number of workloads to skip for pagination

    order

    String

    No

    "desc"

    Sort order: "asc" or "desc"

    sortBy

    String

    No

    "rps"

    Field to sort by (e.g., "rps", "cpuUsage", "memoryUsage")

    sources

    Array

    No

    []

    Filter by data sources

    namespace

    String

    Kubernetes namespace

    workload

    String

    Workload name

    kind

    String

    Kubernetes resource kind (e.g., "ReplicaSet", "StatefulSet", "DaemonSet")

    resourceVersion

    Integer

    Kubernetes resource version

    ready

    Boolean

    Whether the workload is ready

    podsCount

    Integer

    Number of pods in the workload

    p50

    Float

    50th percentile response time in seconds

    p95

    Float

    95th percentile response time in seconds

    p99

    Float

    99th percentile response time in seconds

    rps

    Float

    Requests per second

    errorRate

    Float

    Error rate as a decimal (e.g., 0.004 = 0.4%)

    cpuLimit

    Integer

    CPU limit in millicores (0 = no limit)

    cpuUsage

    Float

    Current CPU usage in millicores

    memoryLimit

    Integer

    Memory limit in bytes (0 = no limit)

    memoryUsage

    Integer

    Current memory usage in bytes

    issueCount

    Integer

    Number of issues detected

    Stop when skip >= total

    Authorization

    Yes

    Bearer token with your API key

    X-Backend-Id

    Yes

    Your backend identifier

    Content-Type

    Yes

    Must be application/json

    Accept

    Yes

    Must be application/json

    conditions

    Array

    No

    []

    Filter conditions for workloads

    limit

    Integer

    No

    100

    Maximum number of workloads to return (1-1000)

    total

    Integer

    Total number of workloads available

    workloads

    Array

    Array of workload objects

    uid

    String

    Unique identifier for the workload

    envType

    String

    Environment type (e.g., "k8s")

    env

    String

    Environment name (e.g., "prod", "ga", "alpha")

    cluster

    String

    Kubernetes cluster name

    skip

    {
      "start": "2025-08-24T07:21:36.944Z",
      "end": "2025-08-24T08:51:36.944Z", 
      "namespaces": ["groundcover"],
      "sources": []
    }
    {
      "deployments": [
        {
          "name": "string",
          "namespace": "string", 
          "workloadName": "string",
          "creationTime": "2023-08-30T18:27:01Z",
          "cluster": "string",
          "env": "string",
          "available": 1,
          "desired": 1,
          "ready": 1,
          "conditions": [
            {
              "type": "string",
              "status": "string",
              "lastProbeTime": null,
              "lastHeartbeatTime": null,
              "lastTransitionTime": "string",
              "reason": "string",
              "message": "string"
            }
          ],
          "warnings": [],
          "id": "string",
          "resourceVersion": 0
        }
      ]
    }
    curl 'https://api.groundcover.com/api/k8s/v2/deployments/list' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'content-type: application/json' \
      -H 'X-Backend-Id: <YOUR_BACKEND_ID>' \
      --data-raw '{"start":"2025-08-24T07:21:36.944Z","end":"2025-08-24T08:51:36.944Z","namespaces":[],"sources":[]}'
    curl 'https://api.groundcover.com/api/k8s/v2/deployments/list' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'content-type: application/json' \
      -H 'X-Backend-Id: <YOUR_BACKEND_ID>' \
      --data-raw '{"start":"2025-08-24T07:21:36.944Z","end":"2025-08-24T08:51:36.944Z","namespaces":["groundcover","monitoring"],"sources":[]}'
    {
      "deployments": [
        {
          "name": "db-manager",
          "namespace": "groundcover",
          "workloadName": "db-manager",
          "creationTime": "2023-08-30T18:27:01Z",
          "cluster": "karma-cluster",
          "env": "prod",
          "available": 1,
          "desired": 1,
          "ready": 1,
          "conditions": [
            {
              "type": "Available",
              "status": "True",
              "lastProbeTime": null,
              "lastHeartbeatTime": null,
              "lastTransitionTime": "2025-08-22T06:18:27Z",
              "reason": "MinimumReplicasAvailable",
              "message": "Deployment has minimum availability."
            },
            {
              "type": "Progressing",
              "status": "True",
              "lastProbeTime": null,
              "lastHeartbeatTime": null,
              "lastTransitionTime": "2023-08-30T18:27:01Z",
              "reason": "NewReplicaSetAvailable",
              "message": "ReplicaSet \"db-manager-867bc8f5b8\" has successfully progressed."
            }
          ],
          "warnings": [],
          "id": "f3b1f4a5-f38a-4c63-a7c0-9333fcbf1906",
          "resourceVersion": 747039184
        }
      ]
    }
    curl 'https://api.groundcover.com/api/k8s/v3/workloads/list' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'content-type: application/json' \
      -H 'X-Backend-Id: <YOUR_BACKEND_ID>' \
      --data-raw '{"conditions":[],"limit":100,"order":"desc","skip":0,"sortBy":"rps","sources":[]}'
    {
      "total": 6314,
      "workloads": [
        {
          "uid": "824b00bf-db68-47b5-8a53-9abd98bf7c0a",
          "envType": "k8s",
          "env": "ga",
          "cluster": "akamai-lk41ok",
          "namespace": "groundcover-incloud",
          "workload": "groundcover-incloud-vector",
          "kind": "ReplicaSet",
          "resourceVersion": 651723275,
          "ready": true,
          "podsCount": 5,
          "p50": 0.0005824280087836087,
          "p95": 0.005730729550123215,
          "p99": 0.0327172689139843,
          "rps": 5526.0027359781125,
          "errorRate": 0,
          "cpuLimit": 0,
          "cpuUsage": 50510.15252730218,
          "memoryLimit": 214748364800,
          "memoryUsage": 46527352832,
          "issueCount": 0
        }
      ]
    }
    # First batch (0-99)
    curl 'https://api.groundcover.com/api/k8s/v3/workloads/list' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'content-type: application/json' \
      -H 'X-Backend-Id: <YOUR_BACKEND_ID>' \
      --data-raw '{"conditions":[],"limit":100,"order":"desc","skip":0,"sortBy":"rps","sources":[]}'
    
    # Second batch (100-199)
    curl 'https://api.groundcover.com/api/k8s/v3/workloads/list' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'content-type: application/json' \
      -H 'X-Backend-Id: <YOUR_BACKEND_ID>' \
      --data-raw '{"conditions":[],"limit":100,"order":"desc","skip":100,"sortBy":"rps","sources":[]}'
    
    # Continue incrementing skip by 100 until you reach the total count

    Query Metrics

    Execute PromQL queries against groundcover metrics data. Two endpoints are available: instant queries for point-in-time values and range queries for time-series data over specific periods.

    Endpoints

    Instant Query

    GET /api/prometheus/api/v1/query

    Execute an instant PromQL query to get metric values at a single point in time.

    Range Query

    POST /api/metrics/query-range

    Execute a PromQL query over a time range to get time-series data.

    Authentication

    Both endpoints require API Key authentication via the Authorization header.

    Instant Query Endpoint

    Request

    GET /api/prometheus/api/v1/query

    Headers

    Query Parameters

    Parameter
    Type
    Required
    Description

    Understanding the Time Parameter

    The time parameter specifies exactly one timestamp at which to evaluate your PromQL expression. This is NOT a time range:

    • With time: "What was the disk usage at 2025-10-21T09:21:44.398Z?"

    • Without time: "What is the disk usage right now?"

    Important: This is different from range queries which return time-series data over a period.

    Instant vs Range Queries - Key Differences

    Aspect
    Instant Query
    Range Query

    Example Comparison:

    • Instant: time=2025-10-21T09:00:00Z → Returns disk usage at exactly 9:00 AM

    • Range: start=2025-10-21T08:00:00Z&end=2025-10-21T09:00:00Z → Returns hourly disk usage trend

    Practical Example:

    Response

    The endpoint returns a Prometheus-compatible response format:

    Response Fields

    Field
    Type
    Description

    Range Query Endpoint

    Request

    POST /api/metrics/query-range

    Headers

    Request Body

    Request Parameters

    Parameter
    Type
    Required
    Description

    Response

    The range query returns a custom format optimized for time-series data:

    Response Fields

    Field
    Type
    Description

    Each data point in the velocity array contains:

    • Timestamp: Unix timestamp as integer

    • Value: Metric value as string

    Examples

    Instant Query Examples

    Current Values (No Time Parameter)

    Get current average disk space usage (evaluated at request time):

    Historical Point-in-Time Values

    Get disk usage at a specific moment in the past:

    Note: This returns the disk usage value at exactly 2025-10-21T09:21:44.398Z, not a range from that time until now.

    Range Query Examples

    24-Hour Disk Usage Trend

    Get disk usage over the last 24 hours with 30-minute resolution:

    High-Resolution CPU Monitoring

    Monitor CPU usage over 1 hour with 1-minute resolution:

    Query Optimization

    1. Use appropriate time ranges: Avoid querying excessively large time ranges

    2. Choose optimal step sizes: Balance resolution with performance

    3. Filter early: Use label filters to reduce data volume

    4. Aggregate wisely: Use grouping to reduce cardinality

    Time Handling

    1. Use RFC3339 format: Always use ISO 8601 timestamps

    2. Account for timezones: Timestamps are in UTC

    3. Align step boundaries: Choose steps that align with data collection intervals

    4. Handle clock skew: Allow for small time differences in distributed systems

    Rate Limiting

    • Concurrent queries: Limit concurrent requests to avoid overwhelming the API

    • Query complexity: Complex queries may take longer and consume more resources

    • Data retention: Historical data availability depends on your retention policy

    Response

    Single value with timestamp

    Array of timestamp-value pairs

    data.result[].value

    array

    [timestamp, value] tuple with Unix timestamp and string value

    stats.seriesFetched

    string

    Number of time series processed

    stats.executionTimeMsec

    number

    Query execution time in milliseconds

    step

    string

    Yes

    Query resolution step (e.g., "30s", "1m", "5m", "1h")

    query

    string

    Yes

    PromQL query string (URL encoded)

    time

    string

    No

    Single point in time to evaluate the query (RFC3339 format). Default: current time

    Purpose

    Get value at one specific moment

    Get time-series data over a period

    Time Parameter

    Single timestamp (time)

    Start and end timestamps (start, end)

    Result

    One data point

    Multiple data points over time

    Use Case

    "What is the current CPU usage?"

    status

    string

    Query execution status ("success" or "error")

    data.resultType

    string

    Type of result data ("vector", "matrix", "scalar", "string")

    data.result

    array

    Array of metric results

    data.result[].metric

    object

    Metric labels as key-value pairs

    promql

    string

    Yes

    PromQL query expression

    start

    string

    Yes

    Range start time in RFC3339 format

    end

    string

    Yes

    velocities

    array

    Array of time-series data objects

    velocities[].velocity

    array

    Array of [timestamp, value] data points

    velocities[].metric

    object

    Metric labels as key-value pairs

    promql

    string

    Echo of the executed PromQL query

    "Show me CPU usage over the last hour"

    Range end time in RFC3339 format

    Authorization: Bearer <YOUR_API_KEY>
    Accept: application/json
    # Get current value (no time parameter)
    # Returns: {"value":[1761040224,"18.45"]} - timestamp is "right now"
    curl '...query=avg(groundcover_node_rt_disk_space_used_percent{})' 
    
    # Get historical value (with time parameter) 
    # Returns: {"value":[1761038504,"18.44"]} - timestamp is exactly what you specified
    curl '...query=avg(groundcover_node_rt_disk_space_used_percent{})&time=2025-10-21T09:21:44.398Z'
    {
      "status": "success",
      "data": {
        "resultType": "vector",
        "result": [
          {
            "metric": {},
            "value": [1761038504.398, "18.442597642017"]
          }
        ]
      },
      "stats": {
        "seriesFetched": "12",
        "executionTimeMsec": 0
      }
    }
    Authorization: Bearer <YOUR_API_KEY>
    Accept: application/json
    Content-Type: application/json
    {
      "promql": "string",
      "start": "string",
      "end": "string", 
      "step": "string"
    }
    {
      "velocities": [
        {
          "velocity": [
            [1760950800, "21.534558665381155"],
            [1760952600, "21.532404350483848"],
            [1760954400, "21.57135294176692"]
          ],
          "metric": {}
        }
      ],
      "promql": "avg(groundcover_node_rt_disk_space_used_percent{})"
    }
    curl 'https://app.groundcover.com/api/prometheus/api/v1/query?query=avg%28groundcover_node_rt_disk_space_used_percent%7B%7D%29' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>'
    curl 'https://app.groundcover.com/api/prometheus/api/v1/query?query=avg%28groundcover_node_rt_disk_space_used_percent%7B%7D%29&time=2025-10-21T09%3A21%3A44.398Z' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>'
    curl 'https://app.groundcover.com/api/metrics/query-range' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'content-type: application/json' \
      --data-raw '{
        "promql": "avg(groundcover_node_rt_disk_space_used_percent{})",
        "start": "2025-10-20T09:19:22.475Z",
        "end": "2025-10-21T09:19:22.475Z",
        "step": "1800s"
      }'
    curl 'https://app.groundcover.com/api/metrics/query-range' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'content-type: application/json' \
      --data-raw '{
        "promql": "avg(groundcover_cpu_usage_percent) by (cluster)",
        "start": "2025-10-21T08:00:00.000Z",
        "end": "2025-10-21T09:00:00.000Z", 
        "step": "1m"
      }'

    Monitor YAML structure

    While we strongly suggest building monitors using our Wizard or Catalog, groundcover supports building and editing your Monitors using YAML. If you choose to do so, the following will provide you the necessary definitions.

    Monitor fields explained

    In this section, you'll find a breakdown of the key fields used to define and configure Monitors within the groundcover platform. Each field plays a critical role in how a Monitor behaves, what data it tracks, and how it responds to specific conditions. Understanding these fields will help you set up effective Monitors to track performance, detect issues, and provide timely alerts.

    Below is a detailed explanation of each field, along with examples to illustrate their usage, ensuring your team can manage and respond to incidents efficiently.

    Field
    Explanation
    Example

    Monitor YAML Examples

    Traces Based Monitors

    MySQL Query Errors Monitor

    gRPC API Errors Monitor

    Log Based Monitors

    High Error Log Rate Monitor

    Query Monitors Summary

    Get a comprehensive list of monitor configurations with detailed execution status, alert states, performance metrics, and complete query definitions. This endpoint provides real-time monitoring data f

    Endpoint

    POST /api/monitors/summary/query

    ["cluster", "namespace", "pod_name"]

    Labels

    A set of pre-defined labels that were set to Issues related to the selected Monitor. Labels can be static, or dynamic using a Monitor's query results.

    team: sre_team

    ExecutionErrorState

    Defines the actions that take place when a Monitor encounters query execution errors.

    Valid options are Alerting, OK and Error.

    • When Alerting is set, query execution errors will result in a firing issue.

    • When Error is set, query execution errors will result in an error state.

    NoDataState

    This defines what happens when queries in the Monitor return empty datasets.

    Valid options are: NoData , Alerting, OK

    • When NoData is set, monitor instance's state will be No Data.

    • When Alerting

    Interval

    Defines how frequently the Monitor evaluates the conditions. Common intervals could be 1m, 5m, etc.

    PendingFor

    Defines the period of consecutive intervals where threshold condition must be met to trigger the alert.

    Trigger

    Defines the condition under which the Monitor fires. This is the definition of threshold for the Monitor, with op - operator and value .

    op: gt, value: 5

    Model

    Describes the queries, thresholds and data processing of the Monitor. It can have the following fields:

    • Queries: List of one or more queries to run, this can be either SQL over ClickHouse, PromQL over VictoriaMetrics, SqlPipeline. Each query will have a name for reference in the monitor.

    • Thresholds: This is the threshold of your Monitor, a threshold has a name, inputName for data input, operator one of gt , lt , within_range, outside_range

    measurementType

    Describe how will we present issues of this Monitor. Some Monitors count events, and some a state. And we will display them differently in our dashboards.

    • state - Will present issues in line chart.

    • event - Will present issues in bar chart, counting events.

    Title

    A string that defines the human-readable name of the Monitor. The title is what you will see in the list of all existing Monitors in the Monitors section.

    Description

    Additional information about the Monitor.

    Severity

    When triggered, this will show the severity level of the Monitor's issue. You can set any severity you want here.

    s1 for Critical

    s2 for High

    s3 for Medium

    s4 for Low

    Header

    This is the header of the generated issues from the Monitor.

    A short string describing the condition that is being monitored. You can also use this as a pattern using labels from you query.

    “HTTP API Error {{ alert.labels.return_code}}”

    ResourceHeaderLabels

    A list of labels that help you identify the resources that are related to the Monitor. This appear as a secondary header in all Issues tables across the platform.

    ["span_name", "kind"] for monitors on protocol issues.

    ContextHeaderLabels

    A list of contextual labels that help you identify the location of the issue. This appears as a subset of the Issue’s labels, and is displayed on all Issues tables across the platform.

    Authentication

    This endpoint requires API Key authentication via the Authorization header.

    Headers

    Request Body

    The request body supports filtering, pagination, sorting, and time range parameters:

    Request Parameters

    Parameter
    Type
    Required
    Description

    conditions

    array

    No

    Array of filter conditions for monitors (empty array returns all)

    limit

    integer

    No

    Maximum number of monitors to return (default: 200)

    skip

    integer

    No

    Sorting Options

    Sort Field
    Description

    "lastFiringStart"

    Last time monitor started firing alerts

    "title"

    Monitor title alphabetically

    "severity"

    Monitor severity level

    "createdAt"

    Monitor creation date

    "updatedAt"

    Last modification date

    "state"

    Current monitor state

    Filter Conditions

    The conditions array accepts filter objects for targeted monitor queries:

    Response

    The endpoint returns a JSON object containing an array of detailed monitor configurations:

    Response Fields

    Top-Level Fields

    Field
    Type
    Description

    hasMonitors

    boolean

    Whether any monitors exist in the system

    monitors

    array

    Array of monitor configuration objects

    Monitor Object Fields

    Field
    Type
    Description

    uuid

    string

    Unique monitor identifier

    title

    string

    Monitor display name

    description

    string

    Monitor description

    severity

    string

    Alert severity level ("S1", "S2", "S3", "S4")

    Examples

    Basic Request

    Get all monitors with default pagination:

    Filter by Time Range

    Get monitors within a specific time window:

    Pagination Example

    Get the second page of results:

    Sort by Creation Date

    Get monitors sorted by newest first:

    Response Example

    title: MySQL Query Errors Monitor
    display:
      header: MySQL Error {{ alert.labels.statusCode }}
      description: This monitor detects MySQL Query errors.
      resourceHeaderLabels:
        - span_name
        - role
      contextHeaderLabels:
        - cluster
        - namespace
        - workload
    severity: S3
    measurementType: event
    model:
      queries:
        - name: threshold_input_query
          dataType: traces
          sqlPipeline:
            selectors:
              - key: _time
                origin: root
                type: string
                processors:
                  - op: toStartOfInterval
                    args:
                      - 1 minutes
                alias: bucket_timestamp
              - key: statusCode
                origin: root
                type: string
                alias: statusCode
              - key: span_name
                origin: root
                type: string
                alias: span_name
              - key: cluster
                origin: root
                type: string
                alias: cluster
              - key: namespace
                origin: root
                type: string
                alias: namespace
              - key: role
                origin: root
                type: string
                alias: role
              - key: workload
                origin: root
                type: string
                alias: workload
              - key: "*"
                origin: root
                type: string
                processors:
                  - op: count
                alias: logs_total
            groupBy:
              - key: _time
                origin: root
                type: string
                processors:
                  - op: toStartOfInterval
                    args:
                      - 1 minutes
              - key: statusCode
                origin: root
                type: string
                alias: statusCode
              - key: span_name
                origin: root
                type: string
                alias: span_name
              - key: cluster
                origin: root
                type: string
                alias: cluster
              - key: namespace
                origin: root
                type: string
                alias: namespace
              - key: role
                origin: root
                type: string
                alias: role
              - key: workload
                origin: root
                type: string
                alias: workload
            orderBy:
              - selector:
                  key: bucket_timestamp
                  origin: root
                  type: string
                direction: ASC
            limit:
            filters:
              operator: and
              conditions:
                - filters:
                    - op: match
                      value: mysql
                  key: eventType
                  origin: root
                  type: string
                - filters:
                    - op: match
                      value: error
                  key: status
                  origin: root
                  type: string
                - filters:
                    - op: match
                      value: eBPF
                  key: source
                  origin: root
                  type: string
          instantRollup: 1 minutes
      thresholds:
        - name: threshold_1
          inputName: threshold_input_query
          operator: gt
          values:
            - 0
    executionErrorState: OK
    noDataState: OK
    evaluationInterval:
      interval: 1m
      pendingFor: 0s
    labels:
      team: infra
    title: gRPC API Errors Monitor
    display:
      header: gRPC API Error {{ alert.labels.statusCode }}
      description: This monitor detects gRPC API errors by identifying responses with a non-zero status code.
      resourceHeaderLabels:
        - span_name
        - role
      contextHeaderLabels:
        - cluster
        - namespace
        - workload
    severity: S3
    measurementType: event
    model:
      queries:
        - name: threshold_input_query
          dataType: traces
          sqlPipeline:
            selectors:
              - key: _time
                origin: root
                type: string
                processors:
                  - op: toStartOfInterval
                    args:
                      - 1 minutes
                alias: bucket_timestamp
              - key: statusCode
                origin: root
                type: string
                alias: statusCode
              - key: span_name
                origin: root
                type: string
                alias: span_name
              - key: cluster
                origin: root
                type: string
                alias: cluster
              - key: namespace
                origin: root
                type: string
                alias: namespace
              - key: role
                origin: root
                type: string
                alias: role
              - key: workload
                origin: root
                type: string
                alias: workload
              - key: "*"
                origin: root
                type: string
                processors:
                  - op: count
                alias: logs_total
            groupBy:
              - key: _time
                origin: root
                type: string
                processors:
                  - op: toStartOfInterval
                    args:
                      - 1 minutes
              - key: statusCode
                origin: root
                type: string
                alias: statusCode
              - key: span_name
                origin: root
                type: string
                alias: span_name
              - key: cluster
                origin: root
                type: string
                alias: cluster
              - key: namespace
                origin: root
                type: string
                alias: namespace
              - key: role
                origin: root
                type: string
                alias: role
              - key: workload
                origin: root
                type: string
                alias: workload
            orderBy:
              - selector:
                  key: bucket_timestamp
                  origin: root
                  type: string
                direction: ASC
            limit:
            filters:
              operator: and
              conditions:
                - filters:
                    - op: match
                      value: grpc
                  key: eventType
                  origin: root
                  type: string
                - filters:
                    - op: ne
                      value: "0"
                  key: statusCode
                  origin: root
                  type: string
                - filters:
                    - op: match
                      value: error
                  key: status
                  origin: root
                  type: string
                - filters:
                    - op: match
                      value: eBPF
                  key: source
                  origin: root
                  type: string
          instantRollup: 1 minutes
      thresholds:
        - name: threshold_1
          inputName: threshold_input_query
          operator: gt
          values:
            - 0
    executionErrorState: OK
    noDataState: OK
    evaluationInterval:
      interval: 1m
      pendingFor: 0s
    title: High Error Log Rate Monitor
    severity: S4
    display:
      header: High Log Error Rate
      description: This monitor will trigger an alert when we have a rate of error logs.
      resourceHeaderLabels:
        - workload
      contextHeaderLabels:
        - cluster
        - namespace
    evaluationInterval:
      interval: 1m
      pendingFor: 0s
    model:
      queries:
        - name: threshold_input_query
          dataType: logs
          sqlPipeline:
            selectors:
              - key: _time
                origin: root
                type: string
                processors:
                  - op: toStartOfInterval
                    args:
                      - 1 minutes
                alias: bucket_timestamp
              - key: workload
                origin: root
                type: string
                alias: workload
              - key: namespace
                origin: root
                type: string
                alias: namespace
              - key: cluster
                origin: root
                type: string
                alias: cluster
              - key: "*"
                origin: root
                type: string
                processors:
                  - op: count
                alias: logs_total
            groupBy:
              - key: _time
                origin: root
                type: string
                processors:
                  - op: toStartOfInterval
                    args:
                      - 1 minutes
              - key: workload
                origin: root
                type: string
                alias: workload
              - key: namespace
                origin: root
                type: string
                alias: namespace
              - key: cluster
                origin: root
                type: string
                alias: cluster
            orderBy:
              - selector:
                  key: bucket_timestamp
                  origin: root
                  type: string
                direction: ASC
            limit:
            filters:
              conditions:
                - filters:
                    - op: match
                      value: error
                  key: level
                  origin: root
                  type: string
              operator: and
          instantRollup: 1 minutes
      thresholds:
        - name: threshold_1
          inputName: threshold_input_query
          operator: gt
          values:
            - 150
    noDataState: OK
    measurementType: event
    Authorization: Bearer <YOUR_API_KEY>
    Content-Type: application/json
    Accept: text/event-stream
    {
      "conditions": [],
      "limit": 200,
      "skip": 0,
      "maxInstances": 10,
      "order": "desc",
      "sortBy": "lastFiringStart",
      "start": "2025-10-12T08:19:18.582Z",
      "end": "2025-10-12T09:19:18.582Z"
    }
    {
      "conditions": [
        {
          "field": "severity",
          "operator": "equals",
          "value": "S1"
        },
        {
          "field": "state",
          "operator": "in",
          "values": ["Alerting", "Normal"]
        }
      ]
    }
    {
      "hasMonitors": true,
      "monitors": [
        {
          "uuid": "string",
          "title": "string",
          "description": "string",
          "severity": "string",
          "measurementType": "string",
          "state": "string",
          "alertingCount": 0,
          "model": {
            "queries": [],
            "thresholds": []
          },
          "interval": {
            "interval": "string",
            "for": "string"
          },
          "executionErrorState": "string",
          "noDataState": "string",
          "isPaused": false,
          "createdBy": 0,
          "createdByEmail": "string",
          "createdAt": "string",
          "updatedAt": "string",
          "lastStateStart": "string",
          "lastFiringStart": "string",
          "firstFiringStart": "string",
          "lastResolved": "string",
          "minEvaluationDurationSeconds": 0.0,
          "avgEvaluationDurationSeconds": 0.0,
          "maxEvaluationDurationSeconds": 0.0,
          "lastEvaluationError": "string",
          "lastEvaluationTimestamp": "string",
          "silenced": false,
          "fullySilenced": false,
          "silence_uuids": []
        }
      ]
    }
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/monitors/summary/query' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --header 'Accept: text/event-stream' \
      --data-raw '{
        "conditions": [],
        "limit": 200,
        "skip": 0,
        "maxInstances": 10,
        "order": "desc",
        "sortBy": "lastFiringStart"
      }'
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/monitors/summary/query' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --header 'Accept: text/event-stream' \
      --data-raw '{
        "conditions": [],
        "limit": 100,
        "skip": 0,
        "maxInstances": 10,
        "order": "desc",
        "sortBy": "lastFiringStart",
        "start": "2025-10-12T08:00:00.000Z",
        "end": "2025-10-12T10:00:00.000Z"
      }'
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/monitors/summary/query' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --header 'Accept: text/event-stream' \
      --data-raw '{
        "conditions": [],
        "limit": 50,
        "skip": 50,
        "maxInstances": 10,
        "order": "desc",
        "sortBy": "title"
      }'
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/monitors/summary/query' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --header 'Accept: text/event-stream' \
      --data-raw '{
        "conditions": [],
        "limit": 100,
        "skip": 0,
        "maxInstances": 10,
        "order": "desc",
        "sortBy": "createdAt"
      }'
    {
      "hasMonitors": true,
      "monitors": [
        {
          "uuid": "12345678-1234-1234-1234-123456789abc",
          "title": "Example_Latency_Monitor",
          "description": "",
          "template": "Example_Latency_Monitor",
          "severity": "S2",
          "measurementType": "event",
          "header": "Example_Latency_Monitor",
          "resourceLabels": ["workload"],
          "contextLabels": ["namespace", "cluster"],
          "category": "",
          "interval": {
            "interval": "5m0s",
            "for": "1m0s"
          },
          "model": {
            "queries": [
              {
                "dataType": "traces",
                "name": "threshold_input_query",
                "sqlPipeline": {
                  "selectors": [
                    {
                      "key": "workload",
                      "origin": "root",
                      "type": "string",
                      "alias": "workload"
                    },
                    {
                      "key": "namespace",
                      "origin": "root",
                      "type": "string",
                      "alias": "namespace"
                    },
                    {
                      "key": "cluster",
                      "origin": "root",
                      "type": "string",
                      "alias": "cluster"
                    }
                  ]
                },
                "instantRollup": "5 minutes"
              }
            ],
            "reducers": null,
            "thresholds": [
              {
                "name": "threshold_1",
                "inputName": "threshold_input_query",
                "operator": "gt",
                "values": [502]
              }
            ],
            "query": "SELECT workload, namespace, cluster, count(*) AS logs_total FROM traces WHERE (start_timestamp < toStartOfInterval(NOW(), INTERVAL '5 MINUTE') AND start_timestamp >= (toStartOfInterval(NOW(), INTERVAL '5 MINUTE') - INTERVAL '5 minutes')) GROUP BY workload, namespace, cluster",
            "type": "traces"
          },
          "reducer": "",
          "trigger": {
            "op": "gt",
            "value": 502
          },
          "labelsMapping": {
            "owner": "example-user"
          },
          "executionErrorState": "",
          "noDataState": "OK",
          "isPaused": false,
          "createdBy": 12345,
          "createdByEmail": "[email protected]",
          "createdAt": "2025-03-14T20:42:36.949847Z",
          "updatedAt": "2025-09-21T12:17:00.130801Z",
          "relativeTimerange": {},
          "silences": [],
          "monitorId": "12345678-1234-1234-1234-123456789abc",
          "state": "Alerting",
          "lastStateStart": "0001-01-01T00:00:00Z",
          "lastFiringStart": null,
          "firstFiringStart": null,
          "lastResolved": null,
          "alertingCount": 11,
          "silenced": false,
          "fullySilenced": false,
          "silence_uuids": [],
          "minEvaluationDurationSeconds": 7.107210216,
          "avgEvaluationDurationSeconds": 7.1096896183047775,
          "maxEvaluationDurationSeconds": 7.119120884,
          "lastEvaluationError": "",
          "lastEvaluationTimestamp": "2025-10-12T09:15:50Z"
        }
      ]
    }

    When OK is set, query execution errors will do neither of the above. This is the default setting

    is set, monitor instance's state will be
    Pending
    and then will change to
    Alerting
    once the pending period of the monitor ends.
  • When OK is set, monitor instance's state will be Normal. This is the default setting.

  • and array of values which are the threshold values.

    Number of monitors to skip for pagination (default: 0)

    maxInstances

    integer

    No

    Maximum instances per monitor result (default: 10)

    order

    string

    No

    Sort order: "asc" or "desc" (default: "desc")

    sortBy

    string

    No

    Field to sort by (see sorting options below)

    start

    string

    No

    Start time for filtering (ISO 8601 format)

    end

    string

    No

    End time for filtering (ISO 8601 format)

    measurementType

    string

    Monitor type ("state", "event")

    state

    string

    Current monitor state ("Normal", "Alerting", "Paused")

    alertingCount

    integer

    Number of active alerts

    model

    object

    Monitor configuration with queries and thresholds

    interval

    object

    Evaluation timing configuration

    executionErrorState

    string

    State when execution fails

    noDataState

    string

    State when no data is available

    isPaused

    boolean

    Whether monitor is currently paused

    createdBy

    integer

    Creator user ID

    createdByEmail

    string

    Creator email address

    createdAt

    string

    Creation timestamp (ISO 8601)

    updatedAt

    string

    Last update timestamp (ISO 8601)

    lastStateStart

    string

    When current state began

    lastFiringStart

    string

    When monitor last started alerting

    firstFiringStart

    string

    When monitor first started alerting

    lastResolved

    string

    When monitor was last resolved

    minEvaluationDurationSeconds

    float

    Fastest query execution time

    avgEvaluationDurationSeconds

    float

    Average query execution time

    maxEvaluationDurationSeconds

    float

    Slowest query execution time

    lastEvaluationError

    string

    Last execution error message

    lastEvaluationTimestamp

    string

    Last evaluation timestamp

    silenced

    boolean

    Whether monitor is silenced

    fullySilenced

    boolean

    Whether monitor is completely silenced

    silence_uuids

    array

    Array of silence rule identifiers

    Metrics & Labels

    Kubernetes Infrastructure Metrics & Labels

    Node CPU, Memory and Disk

    Labels

    type clusterId region node_name

    Metrics

    Name
    Description
    Unit
    Type

    Storage Usage

    Labels

    type clusterId region name namespace

    Metrics

    Name
    Description
    Unit
    Type

    Network Usage

    Labels

    clusterId workload_name namespace container_name remote_service_name remote_namespace remote_is_external availability_zone region remote_availability_zone remote_region

    Notes:

    • is_loopback and remote_is_external are special labels that indicate the remote service is either the same service as the recording side (loopback) or resides in an external network, e.g managed service outside of the cluster (external).

      • In both cases the remote_service_name and the remote_namespace labels will be empty

    Metrics

    Name
    Description
    Unit
    Type

    Kubernetes Resources

    Labels

    type resource condition status clusterId region namespace workload_name deployment unit

    Metrics

    Name
    Description
    Unit
    Type

    Container Metrics & Labels

    Container CPU

    Labels

    type clusterId region namespace node_name workload_name pod_name container_name container_image

    Metrics

    Name
    Description
    Unit
    Type

    Container Memory

    Labels

    type clusterId region namespace node_name workload_name pod_name container_name container_image

    Metrics

    Name
    Description
    Unit
    Type

    Container I/O

    Labels

    type clusterId region namespace node_name workload_name pod_name container_name container_image

    Metrics

    Name
    Description
    Unit
    Type

    Container Network

    Labels

    type clusterId region namespace node_name workload_name pod_name container_name container_image

    Metrics

    Name
    Description
    Unit
    Type

    Container Status

    Labels

    type clusterId region namespace node_name workload_name pod_name container_name container_image

    Metrics

    Name
    Description
    Unit
    Type

    Host Resources Metrics & Labels

    Host CPU

    Labels

    clusterId env region host_name cloud_provider env_type

    Metrics

    Name
    Description
    Unit
    Type

    Host Memory

    Labels

    clusterId env region host_name cloud_provider env_type

    Metrics

    Name
    Description
    Unit
    Type

    Host Disk

    Labels

    clusterId env region host_name cloud_provider env_type Optional: device_name

    Metrics

    Name
    Description
    Unit
    Type

    Host I/O

    Labels

    clusterId env region host_name cloud_provider env_type Optional: device_name

    Metrics

    Name
    Description
    Unit
    Type

    Host Filesystem

    Labels

    clusterId env region host_name cloud_provider env_type device_name file_system mountpoint

    Metrics

    Name
    Description
    Unit
    Type

    Host File Handles

    Labels

    clusterId env region host_name cloud_provider env_type

    Metrics

    Name
    Description
    Unit
    Type

    Host Network

    Labels

    clusterId env region host_name cloud_provider env_type device

    Metrics

    Name
    Description
    Unit
    Type

    Application Metrics & Labels

    Label name
    Description
    Relevant types

    Summary based metrics have an additional quantile label, representing the percentile. Available values: [”0.5”, “0.95”, 0.99”].

    We also use a set of internal labels which are not relevant in most use-cases. Find them interesting?

    issue_id entity_id resource_id query_id aggregation_id parent_entity_id

    Golden Signals (Errors & Issues)

    In the lists below, we describe error and issue counters. Every issue flagged by the platform is an error; but not every error is flagged as an issue.

    Resource metrics

    Name
    Description
    Unit
    Type

    Workload metrics

    Name
    Description
    Unit
    Type

    Kafka specific metrics

    Name
    Description
    Unit
    Type

    groundcover_node_used_disk_space

    Current used disk space in the current node

    Bytes

    Gauge

    groundcover_node_free_disk_space

    Free disk space in the current node

    Bytes

    Gauge

    groundcover_node_total_disk_space

    Total disk space in the current node

    Bytes

    Gauge

    groundcover_node_used_percent_disk_space

    Percentage of used disk space in the current node

    Percentage

    Gauge

    groundcover_pvc_usage_percent

    Percentage of used Persistent Volume Claim (PVC) storage

    Percentage

    Gauge

    groundcover_pvc_read_bytes_total

    Total bytes read by the workload from the Persistent Volume Claim (PVC)

    Bytes

    Counter

    groundcover_pvc_write_bytes_total

    Total bytes written by the workload to the Persistent Volume Claim (PVC)

    Bytes

    Counter

    groundcover_pvc_reads_total

    Total read operations performed by the workload from the Persistent Volume Claim (PVC)

    Number

    Counter

    groundcover_pvc_writes_total

    Total write operations performed by the workload to the Persistent Volume Claim (PVC)

    Number

    Counter

    groundcover_pvc_read_latency

    Latency of read operations from the Persistent Volume Claim (PVC) by the workload

    Seconds

    Summary

    groundcover_pvc_write_latency

    Latency of write operations to the Persistent Volume Claim (PVC) by the workload

    Seconds

    Summary

    groundcover_pvc_read_latency_count

    Count of read operations latency for the Persistent Volume Claim (PVC)

    Number

    Counter

    groundcover_pvc_read_latency_sum

    Sum of read operation latencies for the Persistent Volume Claim (PVC)

    Seconds

    Counter

    groundcover_pvc_read_latency_summary

    Summary of read operations latency for the Persistent Volume Claim (PVC)

    Milliseconds

    Counter

    groundcover_pvc_write_latency_count

    Count of write operations sampled for latency on the Persistent Volume Claim (PVC)

    Number

    Counter

    groundcover_pvc_write_latency_sum

    Sum of write operation latencies for the Persistent Volume Claim (PVC)

    Seconds

    Counter

    groundcover_pvc_write_latency_summary

    Summary of write operations latency for the Persistent Volume Claim (PVC)

    Milliseconds

    Counter
    is_cross_az
    protocol
    role
    server_port
    encryption
    transport_protocol
    is_loopback
    is_cross_az means the traffic was sent and/or received between two different availability zones. This is a helpful flag to quickly identify this special kind of communication.
    • The actual zones are detailed in the availability_zone and remote_availability_zone labels

    groundcover_network_connections_closed_total

    Total connections closed by the workload

    Number

    Counter

    groundcover_network_connections_opened_failed_total

    Total number of failed network connection attempts by the workload

    Number

    Counter

    groundcover_network_connections_refused_failed_total

    Connections attempts refused per workload

    Number

    Counter

    groundcover_network_connections_opened_refused_total

    Total number of network connections refused by the workload

    Number

    Counter

    groundcover_network_rx_ops_total

    Total number of read operations issued by the workload

    Number

    Counter

    groundcover_network_tx_ops_total

    Total number of write operations issued by the workload

    Number

    Counter

    groundcover_kube_daemonset_status_number_available

    Number of available Pods for the DaemonSet

    Number

    Gauge

    groundcover_kube_daemonset_status_number_misscheduled

    Number of Pods running on nodes they should not be scheduled on

    Number

    Gauge

    groundcover_kube_daemonset_status_number_ready

    Number of ready Pods for the DaemonSet

    Number

    Gauge

    groundcover_kube_daemonset_status_number_unavailable

    Number of unavailable Pods for the DaemonSet

    Number

    Gauge

    groundcover_kube_daemonset_status_observed_generation

    Most recent generation observed for the DaemonSet

    Number

    Gauge

    groundcover_kube_daemonset_status_updated_number_scheduled

    Number of Pods updated and scheduled by the DaemonSet

    Number

    Gauge

    groundcover_kube_deployment_created

    Creation timestamp of the Deployment

    Seconds

    Gauge

    groundcover_kube_deployment_metadata_generation

    Sequence number representing a specific generation of the Deployment

    Number

    Gauge

    groundcover_kube_deployment_spec_paused

    Whether the Deployment is paused

    Number

    Gauge

    groundcover_kube_deployment_spec_replicas

    Desired number of replicas for the Deployment

    Number

    Gauge

    groundcover_kube_deployment_spec_strategy_rollingupdate_max_unavailable

    Maximum number of unavailable Pods during a rolling update for the Deployment

    Number

    Gauge

    groundcover_kube_deployment_status_condition

    Current condition of the Deployment (labeled by type and status)

    Number

    Gauge

    groundcover_kube_deployment_status_observed_generation

    Most recent generation observed for the Deployment

    Number

    Gauge

    groundcover_kube_deployment_status_replicas

    Number of replicas for the Deployment

    Number

    Gauge

    groundcover_kube_deployment_status_replicas_available

    Number of available replicas for the Deployment

    Number

    Gauge

    groundcover_kube_deployment_status_replicas_ready

    Number of ready replicas for the Deployment

    Number

    Gauge

    groundcover_kube_deployment_status_replicas_unavailable

    Number of unavailable replicas for the Deployment

    Number

    Gauge

    groundcover_kube_deployment_status_replicas_updated

    Number of updated replicas for the Deployment

    Number

    Gauge

    groundcover_kube_horizontalpodautoscaler_spec_max_replicas

    Maximum number of replicas configured for the HPA

    Number

    Gauge

    groundcover_kube_horizontalpodautoscaler_spec_min_replicas

    Minimum number of replicas configured for the HPA

    Number

    Gauge

    groundcover_kube_horizontalpodautoscaler_spec_target_metric

    Configured HPA target metric value

    Number

    Gauge

    groundcover_kube_horizontalpodautoscaler_status_condition

    Current condition of the Horizontal Pod Autoscaler (labeled by type and status)

    Number

    Gauge

    groundcover_kube_horizontalpodautoscaler_status_current_replicas

    Current number of replicas managed by the HPA

    Number

    Gauge

    groundcover_kube_horizontalpodautoscaler_status_desired_replicas

    Desired number of replicas as calculated by the HPA

    Number

    Gauge

    groundcover_kube_horizontalpodautoscaler_status_target_metric

    Current observed value of the HPA target metric

    Number

    Gauge

    groundcover_kube_job_complete

    Whether the Job has completed successfully

    Number

    Gauge

    groundcover_kube_job_failed

    Whether the Job has failed

    Number

    Gauge

    groundcover_kube_job_spec_completions

    Desired number of successfully finished Pods for the Job

    Number

    Gauge

    groundcover_kube_job_spec_parallelism

    Desired number of Pods running in parallel for the Job

    Number

    Gauge

    groundcover_kube_job_status_active

    Number of actively running Pods for the Job

    Number

    Gauge

    groundcover_kube_job_status_completion_time

    Completion time of the Job as Unix timestamp

    Seconds

    Gauge

    groundcover_kube_job_status_failed

    Number of failed Pods for the Job

    Number

    Gauge

    groundcover_kube_job_status_start_time

    Start time of the Job as Unix timestamp

    Seconds

    Gauge

    groundcover_kube_job_status_succeeded

    Number of succeeded Pods for the Job

    Number

    Gauge

    groundcover_kube_node_created

    Creation timestamp of the Node

    Seconds

    Gauge

    groundcover_kube_node_spec_taint

    Node taint information (labeled by key, value and effect)

    Number

    Gauge

    groundcover_kube_node_spec_unschedulable

    Whether a node can schedule new pods

    Number

    Gauge

    groundcover_kube_node_status_allocatable

    The amount of resources allocatable for pods (after reserving some for system daemons)

    Number

    Gauge

    groundcover_kube_node_status_capacity

    The total amount of resources available for a node

    Number

    Gauge

    groundcover_kube_node_status_condition

    The condition of a cluster node

    Number

    Gauge

    groundcover_kube_persistentvolume_capacity_bytes

    Capacity of the PersistentVolume

    Bytes

    Gauge

    groundcover_kube_persistentvolume_status_phase

    Current phase of the PersistentVolume

    Number

    Gauge

    groundcover_kube_persistentvolumeclaim_access_mode

    Access mode of the PersistentVolumeClaim

    Number

    Gauge

    groundcover_kube_persistentvolumeclaim_status_phase

    Current phase of the PersistentVolumeClaim

    Number

    Gauge

    groundcover_kube_pod_container_resource_limits

    The number of requested limit resource by a container. It is recommended to use the `kube_pod_resource_limits` metric exposed by kube-scheduler instead, as it is more precise.

    Number

    Gauge

    groundcover_kube_pod_container_resource_requests

    The number of requested request resource by a container. It is recommended to use the `kube_pod_resource_requests` metric exposed by kube-scheduler instead, as it is more precise.

    Number

    Gauge

    groundcover_kube_pod_container_status_last_terminated_exitcode

    The last termination exit code for the container

    Number

    Gauge

    groundcover_kube_pod_container_status_last_terminated_reason

    The last termination reason for the container

    Number

    Gauge

    groundcover_kube_pod_container_status_ready

    Describes whether the containers readiness check succeeded

    Number

    Gauge

    groundcover_kube_pod_container_status_restarts_total

    The number of container restarts per container

    Number

    Counter

    groundcover_kube_pod_container_status_running

    Describes whether the container is currently in running state

    Number

    Gauge

    groundcover_kube_pod_container_status_terminated

    Describes whether the container is currently in terminated state

    Number

    Gauge

    groundcover_kube_pod_container_status_terminated_reason

    Describes the reason the container is currently in terminated state

    Number

    Gauge

    groundcover_kube_pod_container_status_waiting

    Describes whether the container is currently in waiting state

    Number

    Gauge

    groundcover_kube_pod_container_status_waiting_reason

    Describes the reason the container is currently in waiting state

    Number

    Gauge

    groundcover_kube_pod_created

    Creation timestamp of the Pod

    Seconds

    Gauge

    groundcover_kube_pod_init_container_resource_limits

    The number of CPU cores requested limit by an init container

    Bytes

    Gauge

    groundcover_kube_pod_init_container_resource_requests

    Requested resources by init container (labeled by resource and unit)

    Number

    Gauge

    groundcover_kube_pod_init_container_resource_requests_memory_bytes

    Requested memory by init containers

    Bytes

    Gauge

    groundcover_kube_pod_init_container_status_last_terminated_reason

    The last termination reason for the init container

    Number

    Gauge

    groundcover_kube_pod_init_container_status_ready

    Describes whether the init containers readiness check succeeded

    Number

    Gauge

    groundcover_kube_pod_init_container_status_restarts_total

    The number of restarts for the init container

    Number

    Gauge

    groundcover_kube_pod_init_container_status_running

    Describes whether the init container is currently in running state

    Number

    Gauge

    groundcover_kube_pod_init_container_status_terminated

    Describes whether the init container is currently in terminated state

    Number

    Gauge

    groundcover_kube_pod_init_container_status_terminated_reason

    Describes the reason the init container is currently in terminated state

    Number

    Gauge

    groundcover_kube_pod_init_container_status_waiting

    Describes whether the init container is currently in waiting state

    Number

    Gauge

    groundcover_kube_pod_init_container_status_waiting_reason

    Describes the reason the init container is currently in waiting state

    Number

    Gauge

    groundcover_kube_pod_spec_volumes_persistentvolumeclaims_readonly

    Whether the PersistentVolumeClaim is mounted as read-only in the Pod

    Number

    Gauge

    groundcover_kube_pod_status_phase

    The pods current phase

    Number

    Gauge

    groundcover_kube_pod_status_ready

    Describes whether the pod is ready to serve requests

    Number

    Gauge

    groundcover_kube_pod_status_scheduled

    Describes the status of the scheduling process for the pod

    Number

    Gauge

    groundcover_kube_pod_status_unschedulable

    Whether the Pod is unschedulable

    Number

    Gauge

    groundcover_kube_pod_tolerations

    Pod tolerations configuration

    Number

    Gauge

    groundcover_kube_replicaset_spec_replicas

    Desired number of replicas for the ReplicaSet

    Number

    Gauge

    groundcover_kube_replicaset_status_fully_labeled_replicas

    Number of fully labeled replicas for the ReplicaSet

    Number

    Gauge

    groundcover_kube_replicaset_status_observed_generation

    Most recent generation observed for the ReplicaSet

    Number

    Gauge

    groundcover_kube_replicaset_status_ready_replicas

    Number of ready replicas for the ReplicaSet

    Number

    Gauge

    groundcover_kube_replicaset_status_replicas

    Number of replicas for the ReplicaSet

    Number

    Gauge

    groundcover_kube_resourcequota

    Resource quota information (labeled by resource and type: hard/used)

    Number

    Gauge

    groundcover_kube_resourcequota_created

    Creation timestamp of the ResourceQuota as Unix seconds

    Seconds

    Gauge

    groundcover_kube_statefulset_metadata_generation

    Sequence number representing a specific generation of the StatefulSet

    Number

    Gauge

    groundcover_kube_statefulset_replicas

    Desired number of replicas for the StatefulSet

    Number

    Gauge

    groundcover_kube_statefulset_status_current_revision

    Current revision of the StatefulSet

    Number

    Gauge

    groundcover_kube_statefulset_status_observed_generation

    Most recent generation observed for the StatefulSet

    Number

    Gauge

    groundcover_kube_statefulset_status_replicas

    Number of replicas for the StatefulSet

    Number

    Gauge

    groundcover_kube_statefulset_status_replicas_available

    Number of available replicas for the StatefulSet

    Number

    Gauge

    groundcover_kube_statefulset_status_replicas_current

    Number of current replicas for the StatefulSet

    Number

    Gauge

    groundcover_kube_statefulset_status_replicas_ready

    Number of ready replicas for the StatefulSet

    Number

    Gauge

    groundcover_kube_statefulset_status_replicas_updated

    Number of updated replicas for the StatefulSet

    Number

    Gauge

    groundcover_kube_statefulset_status_update_revision

    Update revision of the StatefulSet

    Number

    Gauge

    groundcover_kube_job_duration

    Time elapsed between the start and completion time of the Job, or current time if the Job is still running

    Seconds

    Gauge

    groundcover_kube_pod_uptime

    Time elapsed since the Pod was created

    Seconds

    Gauge

    groundcover_container_cpu_request_usage_percent

    CPU usage rate out of request (usage/request)

    Percentage

    Gauge

    groundcover_container_cpu_throttled_percent

    Percentage of CPU throttling for the container

    Percentage

    Gauge

    groundcover_container_cpu_throttled_periods

    Total number of throttled CPU periods for the container

    Number

    Counter

    groundcover_container_cpu_throttled_rate_millis

    Rate of CPU throttling for the container

    mCPU

    Gauge

    groundcover_container_cpu_throttled_seconds_total

    Total CPU throttling time for K8s container

    Seconds

    Counter

    groundcover_container_cpu_usage_percent

    CPU usage rate (usage/limit)

    Percentage

    Gauge

    groundcover_container_m_cpu_usage_seconds_total

    Total CPU usage time in milli-CPUs for the container

    mCPU

    Counter

    groundcover_container_m_cpu_usage_system_seconds_total

    Total CPU time spent in system mode for the container

    Seconds

    Counter

    groundcover_container_m_cpu_usage_user_seconds_total

    Total CPU time spent in user mode for the container

    Seconds

    Counter

    groundcover_container_cpu_limit_m_cpu

    K8s container CPU limit

    mCPU

    Gauge

    groundcover_container_cpu_request_m_cpu

    K8s container requested CPU allocation

    mCPU

    Gauge

    groundcover_container_cpu_pressure_full_avg10

    Average percentage of time all non-idle tasks were stalled on CPU over 10 seconds

    Percentage

    Gauge

    groundcover_container_cpu_pressure_full_avg300

    Average percentage of time all non-idle tasks were stalled on CPU over 300 seconds

    Percentage

    Gauge

    groundcover_container_cpu_pressure_full_avg60

    Average percentage of time all non-idle tasks were stalled on CPU over 60 seconds

    Percentage

    Gauge

    groundcover_container_cpu_pressure_full_total

    Total time all non-idle tasks were stalled waiting for CPU

    Microseconds

    Counter

    groundcover_container_cpu_pressure_some_avg10

    Average percentage of time at least some tasks were stalled on CPU over 10 seconds

    Percentage

    Gauge

    groundcover_container_cpu_pressure_some_avg300

    Average percentage of time at least some tasks were stalled on CPU over 300 seconds

    Percentage

    Gauge

    groundcover_container_cpu_pressure_some_avg60

    Average percentage of time at least some tasks were stalled on CPU over 60 seconds

    Percentage

    Gauge

    groundcover_container_cpu_pressure_some_total

    Total time at least some tasks were stalled waiting for CPU

    Microseconds

    Counter

    groundcover_container_memory_kernel_usage_bytes

    Kernel memory usage for the container

    Bytes

    Gauge

    groundcover_container_memory_limit_bytes

    K8s container memory limit

    Bytes

    Gauge

    groundcover_container_memory_major_page_faults

    Total number of major page faults for the container

    Number

    Counter

    groundcover_container_memory_oom_events

    Total number of out-of-memory (OOM) events for the container

    Number

    Counter

    groundcover_container_memory_page_faults

    Total number of page faults for the container

    Number

    Counter

    groundcover_container_memory_request_bytes

    K8s container requested memory allocation

    Bytes

    Gauge

    groundcover_container_memory_request_used_percent

    Memory usage rate out of request (usage/request)

    Percentage

    Gauge

    groundcover_container_memory_rss_bytes

    Current memory resident set size (RSS)

    Bytes

    Gauge

    groundcover_container_memory_swap_usage_bytes

    Swap memory usage for the container

    Bytes

    Gauge

    groundcover_container_memory_usage_bytes

    Current memory usage for the container

    Bytes

    Gauge

    groundcover_container_memory_usage_peak_bytes

    Peak memory usage for the container

    Bytes

    Gauge

    groundcover_container_memory_used_percent

    Memory usage rate (usage/limit)

    Percentage

    Gauge

    groundcover_container_memory_pressure_full_avg10

    Average percentage of time all non-idle tasks were stalled on memory over 10 seconds

    Percentage

    Gauge

    groundcover_container_memory_pressure_full_avg300

    Average percentage of time all non-idle tasks were stalled on memory over 300 seconds

    Percentage

    Gauge

    groundcover_container_memory_pressure_full_avg60

    Average percentage of time all non-idle tasks were stalled on memory over 60 seconds

    Percentage

    Gauge

    groundcover_container_memory_pressure_full_total

    Total time all non-idle tasks were stalled waiting for memory

    Microseconds

    Counter

    groundcover_container_memory_pressure_some_avg10

    Average percentage of time at least some tasks were stalled on memory over 10 seconds

    Percentage

    Gauge

    groundcover_container_memory_pressure_some_avg300

    Average percentage of time at least some tasks were stalled on memory over 300 seconds

    Percentage

    Gauge

    groundcover_container_memory_pressure_some_avg60

    Average percentage of time at least some tasks were stalled on memory over 60 seconds

    Percentage

    Gauge

    groundcover_container_memory_pressure_some_total

    Total time at least some tasks were stalled waiting for memory

    Microseconds

    Counter

    groundcover_container_io_write_ops_total

    Total number of write operations by the container

    Number

    Counter

    groundcover_container_disk_delay_seconds

    K8s container disk I/O delay

    Seconds

    Counter

    groundcover_container_io_pressure_full_avg10

    Average percentage of time all non-idle tasks were stalled on I/O over 10 seconds

    Percentage

    Gauge

    groundcover_container_io_pressure_full_avg300

    Average percentage of time all non-idle tasks were stalled on I/O over 300 seconds

    Percentage

    Gauge

    groundcover_container_io_pressure_full_avg60

    Average percentage of time all non-idle tasks were stalled on I/O over 60 seconds

    Percentage

    Gauge

    groundcover_container_io_pressure_full_total

    Total time all non-idle tasks were stalled waiting for I/O

    Microseconds

    Counter

    groundcover_container_io_pressure_some_avg10

    Average percentage of time at least some tasks were stalled on I/O over 10 seconds

    Percentage

    Gauge

    groundcover_container_io_pressure_some_avg300

    Average percentage of time at least some tasks were stalled on I/O over 300 seconds

    Percentage

    Gauge

    groundcover_container_io_pressure_some_avg60

    Average percentage of time at least some tasks were stalled on I/O over 60 seconds

    Percentage

    Gauge

    groundcover_container_io_pressure_some_total

    Total time at least some tasks were stalled waiting for I/O

    Microseconds

    Counter

    groundcover_container_network_tx_bytes_total

    Total bytes transmitted by the container

    Bytes

    Counter

    groundcover_container_network_tx_dropped_total

    Total number of transmitted packets dropped by the container

    Number

    Counter

    groundcover_container_network_tx_errors_total

    Total number of errors encountered while transmitting packets

    Number

    Counter

    groundcover_host_cpu_usage_percent

    Percentage of used cpu in the current host

    Percentage

    Gauge

    groundcover_host_cpu_num_cores

    Number of CPU cores on the host

    Number

    Gauge

    groundcover_host_cpu_user_spent_seconds_total

    Total time spent in user mode

    Seconds

    Counter

    groundcover_host_cpu_user_spent_percent

    Percentage of CPU time spent in user mode

    Percentage

    Gauge

    groundcover_host_cpu_system_spent_seconds_total

    Total time spent in system mode

    Seconds

    Counter

    groundcover_host_cpu_system_spent_percent

    Percentage of CPU time spent in system mode

    Percentage

    Gauge

    groundcover_host_cpu_idle_spent_seconds_total

    Total time spent idle

    Seconds

    Counter

    groundcover_host_cpu_idle_spent_percent

    Percentage of CPU time spent idle

    Percentage

    Gauge

    groundcover_host_cpu_iowait_spent_seconds_total

    Total time spent waiting for I/O to complete

    Seconds

    Counter

    groundcover_host_cpu_iowait_spent_percent

    Percentage of CPU time spent waiting for I/O

    Percentage

    Gauge

    groundcover_host_cpu_nice_spent_seconds_total

    Total time spent on niced processes

    Seconds

    Counter

    groundcover_host_cpu_steal_spent_seconds_total

    Total time spent in involuntary wait (stolen by hypervisor)

    Seconds

    Counter

    groundcover_host_cpu_stolen_spent_percent

    Percentage of CPU time stolen by the hypervisor

    Percentage

    Gauge

    groundcover_host_cpu_irq_spent_seconds_total

    Total time spent handling hardware interrupts

    Seconds

    Counter

    groundcover_host_cpu_softirq_spent_seconds_total

    Total time spent handling software interrupts

    Seconds

    Counter

    groundcover_host_cpu_interrupt_spent_percent

    Percentage of CPU time spent handling interrupts

    Percentage

    Gauge

    groundcover_host_cpu_guest_spent_seconds_total

    Total time spent running guest processes

    Seconds

    Counter

    groundcover_host_cpu_guest_spent_percent

    Percentage of CPU time spent running guest processes

    Percentage

    Gauge

    groundcover_host_cpu_guest_nice_spent_seconds_total

    Total time spent running niced guest processes

    Seconds

    Counter

    groundcover_host_cpu_context_switches_total

    Total number of context switches in the current host

    Number

    Counter

    groundcover_host_cpu_load_avg1

    CPU load average over 1 minute

    Number

    Gauge

    groundcover_host_cpu_load_avg5

    CPU load average over 5 minutes

    Number

    Gauge

    groundcover_host_cpu_load_avg15

    CPU load average over 15 minutes

    Number

    Gauge

    groundcover_host_cpu_load_norm1

    Normalized CPU load over 1 minute

    Number

    Gauge

    groundcover_host_cpu_load_norm5

    Normalized CPU load over 5 minutes

    Number

    Gauge

    groundcover_host_cpu_load_norm15

    Normalized CPU load over 15 minutes

    Number

    Gauge

    groundcover_host_mem_free_bytes

    Free memory in the current host

    Bytes

    Gauge

    groundcover_host_mem_available_bytes

    Available memory in the current host

    Bytes

    Gauge

    groundcover_host_mem_cached_bytes

    Cached memory in the current host

    Bytes

    Gauge

    groundcover_host_mem_buffers_bytes

    Buffer memory in the current host

    Bytes

    Gauge

    groundcover_host_mem_shared_bytes

    Shared memory in the current host

    Bytes

    Gauge

    groundcover_host_mem_slab_bytes

    Slab memory in the current host

    Bytes

    Gauge

    groundcover_host_mem_sreclaimable_bytes

    Reclaimable slab memory in the current host

    Bytes

    Gauge

    groundcover_host_mem_page_tables_bytes

    Page tables memory in the current host

    Bytes

    Gauge

    groundcover_host_mem_commit_limit_bytes

    Memory commit limit in the current host

    Bytes

    Gauge

    groundcover_host_mem_committed_as_bytes

    Committed address space memory in the current host

    Bytes

    Gauge

    groundcover_host_mem_swap_cached_bytes

    Cached swap memory in the current host

    Bytes

    Gauge

    groundcover_host_mem_swap_total_bytes

    Total swap memory in the current host

    Bytes

    Gauge

    groundcover_host_mem_swap_free_bytes

    Free swap memory in the current host

    Bytes

    Gauge

    groundcover_host_mem_swap_used_bytes

    Used swap memory in the current host

    Bytes

    Gauge

    groundcover_host_mem_swap_in_bytes_total

    Swap in bytes in the current host

    Bytes

    Counter

    groundcover_host_mem_swap_out_bytes_total

    Swap out bytes in the current host

    Bytes

    Counter

    groundcover_host_mem_swap_free_percent

    Percentage of free swap memory in the current host

    Percentage

    Gauge

    groundcover_host_mem_usable_percent

    Percentage of usable (available) memory in the current host

    Percentage

    Gauge

    groundcover_host_disk_space_used_percent

    Percentage of used disk space in the current host

    Percentage

    Gauge

    groundcover_host_disk_read_time_ms_total

    Total time spent reading from disk per device in the current host

    Milliseconds

    Counter

    groundcover_host_disk_write_time_ms_total

    Total time spent writing to disk per device in the current host

    Milliseconds

    Counter

    groundcover_host_disk_read_count_total

    Total number of disk reads per device in the current host

    Number

    Counter

    groundcover_host_disk_write_count_total

    Total number of disk writes per device in the current host

    Number

    Counter

    groundcover_host_disk_merged_read_count_total

    Total number of merged disk reads per device in the current host

    Number

    Counter

    groundcover_host_disk_merged_write_count_total

    Total number of merged disk writes per device in the current host

    Number

    Counter

    groundcover_host_io_write_await_ms

    Average time for write requests to be served per device in the current host

    Milliseconds

    Gauge

    groundcover_host_io_await_ms

    Average time for I/O requests to be served per device in the current host

    Milliseconds

    Gauge

    groundcover_host_io_avg_request_size

    Average I/O request size per device in the current host

    Kilobytes

    Gauge

    groundcover_host_io_service_time_ms

    Average service time for I/O requests per device in the current host

    Milliseconds

    Gauge

    groundcover_host_io_avg_queue_size_kb

    Average I/O queue size per device in the current host

    Kilobytes

    Gauge

    groundcover_host_io_utilization_percent

    Percentage of time the device was busy serving I/O requests in the current host

    Percentage

    Gauge

    groundcover_host_io_block_in_total

    Total number of block in the current host

    Number

    Counter

    groundcover_host_io_block_out_total

    Total number of block out in the current host

    Number

    Counter

    groundcover_host_fs_used_percent

    Percentage of used filesystem space in the current host

    Percentage

    Gauge

    groundcover_host_fs_inodes_total

    Total inodes in the filesystem

    Number

    Gauge

    groundcover_host_fs_inodes_used

    Used inodes in the filesystem

    Number

    Gauge

    groundcover_host_fs_inodes_free

    Free inodes in the filesystem

    Number

    Gauge

    groundcover_host_fs_inodes_used_percent

    Percentage of used inodes in the filesystem

    Percentage

    Gauge

    groundcover_host_fs_file_handles_allocated

    Total number of file handles allocated in the current host

    Number

    Gauge

    groundcover_host_fs_file_handles_allocated_unused

    Number of allocated but unused file handles in the current host

    Number

    Gauge

    groundcover_host_fs_file_handles_in_use

    Number of file handles currently in use in the current host

    Number

    Gauge

    groundcover_host_fs_file_handles_max

    Maximum number of file handles available in the current host

    Number

    Gauge

    groundcover_host_fs_file_handles_used_percent

    Percentage of file handles in use in the current host

    Percentage

    Gauge

    groundcover_host_fs_file_handles_used_percent

    Percentage of file handles in use in the current host

    Percentage

    Gauge

    groundcover_host_fs_file_handles_max

    Maximum number of file handles available in the current host

    Number

    Gauge

    groundcover_host_net_transmit_packets_total

    Total packets transmitted on network interface

    Number

    Counter

    groundcover_host_net_receive_dropped_total

    Total number of received packets dropped on network interface

    Number

    Counter

    groundcover_host_net_receive_errors_total

    Total number of receive errors on network interface

    Number

    Counter

    groundcover_host_net_transmit_dropped_total

    Total number of transmitted packets dropped on network interface

    Number

    Counter

    groundcover_host_net_transmit_errors_total

    Total number of transmit errors on network interface

    Number

    Counter

    pod_name

    K8s pod name

    All

    container_name

    K8s container name

    All

    container_image

    K8s container image name

    All

    remote_namespace

    Remote K8s namespace (other side of the communication)

    All

    remote_service_name

    Remote K8s service name (other side of the communication)

    All

    remote_container_name

    Remote K8s container name (other side of the communication)

    All

    type

    The protocol in use (HTTP, gRPC, Kafka, DNS etc.)

    All

    sub_type

    The sub type of the protocol (GET, POST, etc)

    All

    role

    Role in the communication (client or server)

    All

    clustered_resource_name

    The clustered name of the resource, depends on the protocol

    All

    status_code

    "ok", "error" or "unset"

    All

    server

    The server workload/name

    All

    client

    The client workload/name

    All

    server_namesapce

    The server namespace

    All

    client_namespace

    The client namespace

    All

    server_is_external

    Indicate whether the server is external

    All

    client_is_external

    Indicate wheter the client is external

    All

    is_encrypted

    Indicate whether the communication is encrypted

    All

    is_cross_az

    Indicate wether the communication is cross availability zone

    All

    clustered_path

    HTTP / gRPC aggregated resource path (e.g. /metrics/*)

    http, grpc

    method

    HTTP / gRPC method (e.g GET)

    http, grpc

    response_status_code

    Return status code of a HTTP / gPRC request (e.g. 200 in HTTP)

    http, grpc

    dialect

    SQL dialect (MySQL or PostgreSQL)

    mysql, postgresql

    response_status

    Return status code of a SQL query (e.g 42P01 for undefined table)

    mysql, postgresql

    client_type

    Kafka client type (Fetcher / Producer)

    kafka

    topic

    Kafka topic name

    kafka

    partition

    Kafka partition identifier

    kafka

    error_code

    Kafka return status code

    kafka

    query_type

    type of DNS query (e.g. AAAA)

    dns

    response_return_code

    Return status code of a DNS resolution request (e.g. Name Error)

    dns

    exit_code

    K8s container termination exit code

    container_state, container_crash

    state

    K8s container current state (Running, Waiting or Terminated)

    container_state

    state_reason

    K8s container state transition reason (e.g CrashLoopBackOff or OOMKilled)

    container_state

    crash_reason

    K8s container crash reason (e.g Error, OOMKilled)

    container_crash

    pvc_name

    K8s PVC name

    storage

    perspective_entity_id
    perspective_entity_is_external
    perspective_entity_issue_id
    perspective_entity_name
    perspective_entity_namespace
    perspective_entity_resource_id

    groundcover_resource_success_counter

    total amount of resource requests with OK status codes

    Number

    Counter

    groundcover_resource_latency_seconds

    resource latency

    Seconds

    Summary

    groundcover_workload_success_counter

    total amount of requests handled by the workload with OK status codes

    Number

    Counter

    groundcover_workload_latency_seconds

    resource latency across all of the workload APIs

    Seconds

    Summary

    groundcover_node_allocatable_cpum_cpu

    Allocatable CPU in the current node

    mCPU

    Gauge

    groundcover_node_allocatable_mem_bytes

    Allocatable memory in the current node

    Bytes

    Gauge

    groundcover_node_mem_used_percent

    Percentage of used memory in the current node

    Percentage

    groundcover_pvc_usage_bytes

    Persistent Volume Claim (PVC) usage

    Bytes

    Gauge

    groundcover_pvc_capacity_bytes

    Persistent Volume Claim (PVC) capacity

    Bytes

    Gauge

    groundcover_pvc_available_bytes

    Available Persistent Volume Claim (PVC) space

    Bytes

    groundcover_network_rx_bytes_total

    Total bytes received by the workload

    Bytes

    Counter

    groundcover_network_tx_bytes_total

    Total bytes sent by the workload

    Bytes

    Counter

    groundcover_network_connections_opened_total

    Total connections opened by the workload

    Number

    groundcover_kube_cronjob_status_active

    Number of active CronJob executions

    Number

    Gauge

    groundcover_kube_daemonset_status_current_number_scheduled

    Number of Pods currently scheduled by the DaemonSet

    Number

    Gauge

    groundcover_kube_daemonset_status_desired_number_scheduled

    Desired number of Pods scheduled by the DaemonSet

    Number

    groundcover_container_cpu_usage_rate_millis

    CPU usage rate

    mCPU

    Gauge

    groundcover_container_cpu_cfs_periods_total

    Total number of elapsed CPU CFS scheduler enforcement periods for the container

    Number

    Counter

    groundcover_container_cpu_delay_seconds

    K8s container CPU delay

    Seconds

    groundcover_container_memory_working_set_bytes

    Current memory working set

    Bytes

    Gauge

    groundcover_container_mem_working_set_bytes

    Working set memory usage for the container

    Bytes

    Gauge

    groundcover_container_memory_cache_usage_bytes

    Memory cache usage for the container

    Bytes

    groundcover_container_io_read_bytes_total

    Total bytes read by the container

    Bytes

    Counter

    groundcover_container_io_read_ops_total

    Total number of read operations by the container

    Number

    Counter

    groundcover_container_io_write_bytes_total

    Total bytes written by the container

    Bytes

    groundcover_container_network_rx_bytes_total

    Total bytes received by the container

    Bytes

    Counter

    groundcover_container_network_rx_dropped_total

    Total number of received packets dropped by the container

    Number

    Counter

    groundcover_container_network_rx_errors_total

    Total number of errors encountered while receiving packets

    Number

    groundcover_container_uptime_seconds

    Uptime of the container

    Seconds

    Gauge

    groundcover_container_crash_count

    Total count of container crashes

    Number

    Counter

    groundcover_host_uptime_seconds

    Uptime of the current host

    Seconds

    Gauge

    groundcover_host_cpu_capacity_m_cpu

    CPU capacity in the current host

    mCPU

    Gauge

    groundcover_host_cpu_usage_m_cpu

    Cpu usage in the current host

    mCPU

    groundcover_host_mem_capacity_bytes

    Memory capacity in the current host

    Bytes

    Gauge

    groundcover_host_mem_used_bytes

    Memory used in the current host

    Bytes

    Gauge

    groundcover_host_mem_used_percent

    Percentage of used memory in the current host

    Percentage

    groundcover_host_disk_space_used_bytes

    Used disk space in the current host

    Bytes

    Gauge

    groundcover_host_disk_space_free_bytes

    Free disk space in the current host

    Bytes

    Gauge

    groundcover_host_disk_space_total_bytes

    Total disk space in the current host

    Bytes

    groundcover_host_io_read_kb_per_sec

    Disk read throughput per device in the current host

    Kilobytes per second

    Gauge

    groundcover_host_io_write_kb_per_sec

    Disk write throughput per device in the current host

    Kilobytes per second

    Gauge

    groundcover_host_io_read_await_ms

    Average time for read requests to be served per device in the current host

    Milliseconds

    groundcover_host_fs_used_bytes

    Used filesystem space in the current host

    Bytes

    Gauge

    groundcover_host_fs_free_bytes

    Free filesystem space in the current host

    Bytes

    Gauge

    groundcover_host_fs_total_bytes

    Total filesystem space in the current host

    Bytes

    groundcover_host_fs_file_handles_allocated

    Total number of file handles allocated in the current host

    Number

    Gauge

    groundcover_host_fs_file_handles_allocated_unused

    Number of allocated but unused file handles in the current host

    Number

    Gauge

    groundcover_host_fs_file_handles_in_use

    Number of file handles currently in use in the current host

    Number

    groundcover_host_net_receive_bytes_total

    Total bytes received on network interface

    Bytes

    Counter

    groundcover_host_net_transmit_bytes_total

    Total bytes transmitted on network interface

    Bytes

    Counter

    groundcover_host_net_receive_packets_total

    Total packets received on network interface

    Number

    clusterId

    Name identifier of the K8s cluster

    All

    region

    Cloud provider region name

    All

    namespace

    K8s namespace

    All

    workload_name

    K8s workload (or service) name

    groundcover_resource_total_counter

    total amount of resource requests

    Number

    Counter

    groundcover_resource_error_counter

    total amount of requests with error status codes

    Number

    Counter

    groundcover_resource_issue_counter

    total amount of requests which were flagged as issues

    Number

    groundcover_workload_total_counter

    total amount of requests handled by the workload

    Number

    Counter

    groundcover_workload_error_counter

    total amount of requests handled by the workload with error status codes

    Number

    Counter

    groundcover_workload_issue_counter

    total amount of requests handled by the workload which were flagged as issues

    Number

    groundcover_workload_client_offset

    client last message offset (for producer the last offset produced, for consumer the last requested offset), aggregated by workload

    Gauge

    groundcover_workload_calc_lagged_messages

    current lag in messages, aggregated by workload

    Number

    Gauge

    groundcover_workload_calc_lag_seconds

    current lag in time, aggregated by workload

    Seconds

    Let us know over Slack!
    Gauge
    Gauge
    Counter
    Gauge
    Counter
    Gauge
    Counter
    Counter
    Gauge
    Gauge
    Gauge
    Gauge
    Gauge
    Gauge
    Counter

    All

    Counter
    Counter
    Gauge