arrow-left

Only this pageAll pages
gitbookPowered by GitBook
triangle-exclamation
Couldn't generate the PDF for 248 pages, generation stopped at 100.
Extend with 50 more pages.
1 of 100

docs

Welcome

Loading...

Loading...

Capabilities

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Getting Started

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Use groundcover

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Introduction

groundcover is a full stack, cloud-native observability platform, developed to break all industry paradigms - from making instrumentation a thing of the past, to decoupling cost from data volumes

The groundcover arrow-up-rightplatform consolidates all your traces, metrics, logs, and Kubernetes events into a single pane of glass, allowing you to identify issues faster than ever before and conduct granular investigations for quick remediation and long-term prevention.

Our pricingarrow-up-right is not impacted by the volume of data generated by the environments you monitor, so you can dare to start monitoring environments that had been blind spots until now - such as your Dev and Staging clusters. This, in turn, provides you visibility into all your environments, making it much more likely to identify issues in the early stages of development, rather than in your live product.

groundcover introduces game-changing concepts to observability:

hashtag
eBPF sensor

(extended Berkeley Packet Filter) is a groundbreaking technology that has significantly impacted the Linux kernel, offering a new way to safely and efficiently extend its capabilities.

By powering our sensor with eBPF, groundcover unlocks unprecedented granularity on your cloud environment, while also practically eliminating the need for human involvement in the installation and deployment process. Our unique sensor collects data directly from the Linux kernel with near-zero impact on CPU and memory.

Advantages of our eBPF sensor:

  • Zero instrumentation: groundcover's eBPF sensor gathers granular observability data without the need for integrating an SDK or changing your applications' code in any way. This enables all your logs, metrics, traces, and other observability data to flow automatically into the platform. In minutes, you gain full visibility into application and infrastructure health, performance, resource usage, and more.

  • Minimal resources footprint: groundcover’s sensor in installed on a dedicated node in each monitored cluster, operating separately from the applications it is monitoring. Without interference with the application's primary functions, the groundcover platform operates with near-zero impact on your resources, maintaining the applications' performance and avoiding unexpected overhead on the infrastructure.

hashtag
Bring-your-own-cloud (BYOC) architecture

The one-of-a-kind architecture on which groundcover was built eliminates all requirements to stream your logs, metrics, traces, and other monitoring data outside of your environment and into a third-party's cloud. By leveraging integrations with best-of-breed technologies, including ClickHouse and Victoria Metrics, all your observability is stored data locally, with the ability of being fully managed by groundcover.

Advantages of our BYOC architecture:

  • By separating the data plane from the control plane, you get the advantages of a SaaS solution, without its security and privacy challenges.

  • With multiple deployment models available, you also get to choose the level of security and privacy your organization needs, up to the highest standards (FedRamp-level).

  • Automated deployment, maintenance & resource optimization with our deployment option.

This concept is unique to groundcover, and takes a while to grasp. Read about our BYOC architecture more in detail in .

circle-info

Learn about groundcover (currently available only on a ), which enables you to deploy groundcover's control plane inside your own environment and delegate the entire setup and management of the groundcover platform.

hashtag
Disruptive pricing model

Enabled by our unique BYOC architecture, groundcover's vision is to revolutionize the industry by offering a pricing model that is unheard of anywhere else. Our fully transparent pricing model is based only on the number of nodes being monitored, and the costs of hosting the groundcover backend in your environment. Volume of logs, metrics, traces, and all other observability data don’t affect your cost. This results in savings of 60-90% compared to SaaS platforms.

In addition, all our subscription tiers never limit your access to features and capabilities.

Advantages of our nodes-based pricing model:

  • Cost is predictable and transparent, becoming an enabler of growth and expansion.

  • The ability to deploy groundcover in data-intensive environments enables the monitoring of Dev and Staging clusters, which promotes early identification of issues.

  • No cardinality or retention limits

Read our latest customer stories to learn how organization of varying sizes all reduce their observability costs dramatically by migrating to groundcover:

hashtag
Stream processing

groundcover applies a stream processing approach to collect and control the continuous flow of data to gain immediate insights, detect anomalies, and respond to changing conditions. Unlike batch processing, where data is collected over a period and then analyzed, stream processing analyzes the data as it flows through the system.

Our platform uses a distributed stream processing engine that enables it to ingest huge amounts of data (such as logs, traces and Kubernetes events) in real time. It also processes all that data and instantly generates complex insights (such as metrics and context) based on it.

As a result, the volume of raw data stored dramatically decreases which, in turn, further reduces the overall cost of observability.

hashtag
Capabilities

hashtag
Log Management

Designed for high scalability and rapid query performance, enabling quick and efficient log analysis from all your environments. Each log is enriched with actionable context and correlated with relevant metrics and traces, providing a comprehensive view for fast troubleshooting.

hashtag
Infrastructure Monitoring

The groundcover platform provides cloud-native infrastructure monitoring, enabling automatic collection and real-time monitoring of infrastructure health and efficiency.

hashtag
Application Performance Monitoring (APM)

Gain end-to-end observability into your applications performance, identify and resolve issues instantly, all with zero code changes.

hashtag
Real User Monitoring (RUM)

Real User Monitoring (RUM) extends groundcover’s observability platform to the client side, providing visibility into actual user interactions and front-end performance. It tracks key aspects of your web application as experienced by real users, then correlates them with backend metrics, logs, and traces for a full-stack view of your system.

FAQ

hashtag
How much does groundcover cost?

groundcover's unique pricing model is the first to decouple data volumes from cost of owning and operating the solution. For example, subscribing to our costs $30 per node/host per month.

Overall, the cost of owning and operating groundcover is based on two factors:

Application Performance Monitoring (APM)

Gain end-to-end observability into your applications performance, identify and resolve issues instantly - all with zero code changes.

hashtag
Overview

The groundcover platform collects data all across your stack using the power of eBPF instrumentation. Our is installed
in seconds and provides 100% coverage into application metrics and traces with zero code changes or configurations.

Resolve faster - By seamlessly correlating traces with application metrics, logs, and infrastructure events, groundcover’s APM enables you to detect and resolve root issues faster.

The number of nodes (hosts) you are running in the environment you are monitoring
  • The costs of hosting groundcover's backend in your environment

  • Check out our TCO calculatorarrow-up-right to simulate your total cost of ownership for groundcover.

    hashtag
    Can I use groundcover across multiple clusters?

    Definitely. As you deploy groundcover each cluster is automatically assigned the unique name it holds inside your cloud environment. You can browse and select all your clusters at one place with our UI experience.

    hashtag
    What K8s flavors are supported?

    groundcover has been tested and validated on the most common K8s distributions. See full list in the Requirements section.

    hashtag
    What protocols are supported?

    groundcover supports the most common protocols in most K8s production environments out-of-the-box. See full list here.

    hashtag
    What types of data does groundcover collect?

    groundcover's kernel-level eBPF sensor automatically collects your logs, application metrics (such as latency, throughput, error rate and much more), infrastructure metrics (such as deployment updates, container crashes etc.), traces, and Kubernetes events. You can control which data is left out of the automatic collection using data obfuscation.

    hashtag
    Where is my data being stored?

    groundcover stores all the data it collects inside your environment, using the state-of-the-art storage services of ClickHouse and Victoria Metrics, with the option to offload data to object storage such as S3 for long-term retention. See our Architecture section for more details.

    hashtag
    Is my data secure?

    groundcover stores the data it collects in-cluster, inside your environment without ever leaving the cluster to be stored anywhere else.

    Our SaaS UI experience stores only information related to the account, user access and general K8s metadata used for governance (like the number of nodes per cluster, the name given to the cluster etc.).

    All the information served to the UI experience is encrypted all the way to the in-cluster data sources. groundcover has no access to your collected data, which is accessible only to an authenticated user from your organization. groundcover does collect telemetry information (opt-out is of course possible) which includes metrics about the performance of the deployment (e.g. resource consumption metrics) and logs reported from the groundcover components running in the cluster.

    All telemetry information is anonymized, and contains no data related to your environment.

    Regardless, groundcover is SOC2 and ISO 27001 compliant and follows best practices.

    hashtag
    How can I invite my team to my workspace?

    If you used your business email to create your groundcover account, you can invite your team to your workspace by clicking on the purple "Invite" button on the upper menu. This will open a pop-up where you can enter the emails of the people you want to invite. You also have an option to copy and share your private link.

    Note: The Admin of the account (i.e. the person that created it) can also invite users outside of your email domain. Non-admin users can only invite users that share the same email domain. If you used a private email, you can only share the link to your workspace by clicking the "Share" button on the top bar.

    Read more about invites in our quick start guide.

    hashtag
    Is groundcover open source?

    groundcover's CLI toolarrow-up-right is currently Open Source along side more projects like Murrearrow-up-right and Carettaarrow-up-right. We're working on releasing more parts of our solution to Open Source very soon. Stay tuned in our GitHubarrow-up-right page!

    hashtag
    What operating system (OS) do I need to use groundcover?

    groundcover’s sensor uses eBPFarrow-up-right, which means it can only deployed on a Kubernetes cluster that is running on a Linux system.

    Installing using the CLI command is currently only supported on Linux and Mac.

    You can install using the Helm command from any operating system.

    Once installed, accessing the groundcover platform is possible from any web browser, on any operating system.

    Enterprise planarrow-up-right
    A new level of insight granularity: With direct access to the Linux kernel, our eBPF sensor enables the collection of data straight from the source. This guarantees that the data is clean, unaltered, and precise. It also offers access to unique insight on your application and infrastructure, such as the ability to view the full traces of payloads, or analyzing network performance over time.

    eBPF sensor

    BYOC architecture

    Disruptive pricing

    eBPFarrow-up-right
    BYOC
    this dedicated section
    BYOC
    paid planarrow-up-right
    Learn more →
    Learn more →
    Learn more →
    Learn more →
    Cover

    Nobl9 expands monitoring to cover production e2e, including testing and staging environments

    Replacing Datadog with groundcover cut Nobl9’s observability costs in half while improving log coverage, providing deeper granularity on traces with eBPF, and enabling operational growth and scalability.

    Cover

    Tracr eliminates blind spots with native-K8s observability and eBPF tracing

    Tracr migrates from a fragmented observability stack to groundcover, gaining deep Kubernetes visibility, automated eBPF tracing, and a cost-effective monitoring solution. This transition streamlined troubleshooting, expanded observability across teams, and enhanced the reliability of their blockchain infrastructure.

    Improve user experience - Optimize your application performance and resource utilization faster than ever before, avoid downtimes and make poor end-user experience a thing of the past.

    hashtag
    Collection

    Our revolutionary eBPF sensor, Floraarrow-up-right, is deployed as a DaemonSet in your Kubernetes cluster. This approach allows us to inspect every packet that each service is sending or receiving, achieving 100% coverage. No sampling rates or relying on statistical luck - all requests and responses are observed.

    This approach would not be feasible without a resource-efficient eBPF-powered sensor. eBPF not only extends the ability to pinpoint issues - it does so with much less overhead than any other method. eBPF can be used to analyze traffic originating from every programming language and SDK - even for encrypted connections!

    circle-info

    Click here for a full list of supported technologies

    hashtag
    Reconstruction

    After being collected by our eBPF code, the traffic is then classified according to its protocol - which is identified directly from the underlying traffic, or the library from which it originated. Connections are reconstructed, and we can generate transactions - HTTP requests and responses, SQL queries and responses etc.

    hashtag
    Enrichment

    In order to give as much context as possible each transaction is enriched with as much metadata as possible. Some examples might include the pods that took part in this transaction (both client and server), the nodes on which these pods are scheduled, and the state of container at the time of the request.

    It is important to emphasize the impressive granularity level with which this process takes place - every single transaction observed is fully enriched. This allows us to perform more advanced aggregations.

    hashtag
    Aggregation

    After being enriched by as much context as possible, the transactions as grouped together into meaningful aggregations. These could be defined by the workloads involved, the protocols detected and the resources that were accessed in the operations. These aggregations will mostly come into play when displaying golden signals.

    hashtag
    Exporting

    After collecting the data, contextualizing it and putting it together in meaningful aggregations - we can now create metrics and traces to provide meaningful insights into the services' behaviors.

    hashtag
    Metrics

    Learn how groundcover's application metrics work:

    hashtag
    Traces

    Learn how groundcover's application traces work:

    proprietary eBPF sensor
    Application Metricschevron-right
    Traceschevron-right

    Log Management

    Stream, store, and query your logs at any scale, for a fixed cost.

    hashtag
    Overview

    Our Log Management solution is built for high scale and fast query performance so you can analyze logs quickly and effectively from all your cloud environments.

    Gain context - Each log data is enriched with actionable context and correlated with relevant metrics and traces in one single view so you can find what you’re looking for and troubleshoot, faster.

    Centralize to maximize - The groundcover platform can act as a limitless, centralized log management hub. Your

    Monitor Catalog page

    Explore and select pre-built Monitors from the catalog to quickly set up observability for your environment. Customize and deploy Monitors in just a few clicks.

    hashtag
    Overview

    The Monitor Catalog is a library of pre-built templates to efficiently create new Monitors. Browse and select one or more Monitors to quickly configure their environment with a single click. The Catalog groups monitors into "Packs", based on different use cases.

    Real-world Use Cases

    These are patterns we've seen in the wild. Agents use groundcover to debug, monitor, and close the loop.

    hashtag
    Test → Logs → Fix

    Cursor generates tests, tags each with a test_id, logs them, and then uses groundcover to instantly fetch related log lines.

    hashtag
    Investigate Issues via Cursor

    Got a monitor firing? Drop the alert into Cursor. The agent runs a quick RCA, queries groundcover, and even suggests a patch based on recent logs and traces.

    hashtag
    Support Workflow

    Support rep gets an error ID → uses MCP to query groundcover → jumps straight to the root cause by exploring traces and logs around the error.

    hashtag
    The Autonomous Loop

    An agent picks up a ticket, writes tests, ships code to staging, monitors it with groundcover, checks logs and traces, and verifies the fix end to end. Yes, really. Full loop. Almost no hands.

    hashtag
    Key Features

    hashtag
    Batch Monitor Creation

    You can select as many monitors as you wish, and add them all in one click. Select a complete pack or multiple Monitors from different packs, then click "Create Monitor". All Monitors will be automatically created. You can always edit them later.

    hashtag
    Single Monitor Creation

    You can also create a single Monitor from the Catalog. When hovering over a Monitor, a "Wizard" button will appear. Clicking on it will direct you to the Monitor Creation Wizard where you can review and edit before creation.

    Monitors Catalog

    CPU architectures

    The following architectures are fully supported for all groundcover workloads:

    • x86

    • ARM

    Service Accounts

    A service account is a non-human identity for API access, governed by RBAC and supporting multiple API keys.

    hashtag
    Summary

    Service accounts in groundcover are non-human identities used for programmatic access to the API. They’re ideal for CI pipelines, automation, and backend services, and are governed by groundcover’s RBAC system.

    hashtag
    Identity and Permissions

    A service account has a name and email, but it cannot be used to log into the UI or via SSO. Instead, it functions purely for API access. Each account must have at least one RBAC policy assigned, which defines its permission level (Admin, Editor, Viewer) and data scope. Multiple policies can be attached to broaden access; effective permissions are the union of all policies.

    hashtag
    Creation and Management

    Only Admins can create, update, or delete service accounts. This can be done via the UI (Settings → Access → Service Accounts) or API. During creation, Admins define the name, email, and initial policies. You can edit service account, changing email address and assigned policies, but can't rename.

    hashtag
    API Key Association

    A service account can have multiple API keys. This makes it easy to rotate credentials or issue distinct keys for different use cases. All keys are tied to the same account and carry its permissions. Any action taken using a key is logged as performed by the associated service account.

    Embedded Grafana

    are completely unaffected by the amount of logs you choose to store or query. It's entirely up to you to decide.

    hashtag
    Collection

    hashtag
    Seamless log collection

    groundcover ensures a seamless log collection experience with our proprietary eBPF sensor, which automatically collects and aggregates all logs in all formats - including JSON, plain text, NGINX logs, and more. All this without any configuration needed.

    This sensor is deployed as a DaemonSet, running a single pod on each node within your Kubernetes cluster. This configuration enables the groundcover platform to automatically collect logs from all of your pods, across all namespaces in your cluster. This means that once you've installed groundcover, no further action is needed on your part for log collection. The logs collected by each sensor instance are then channeled to the OTel Collector.

    hashtag
    OTel Collector: A vendor-agnostic way to receive, process and export telemetry data.

    Acting as the central processing hub, the OTel Collector is a vendor-agnostic tool that receives logs from various sensor pods. It processes, enriches, and forwards the data into groundcover's ClickHouse database, where all log data from your cluster is securely stored.

    hashtag
    Logs Attributes

    Logs Attributes enable advanced filtering capabilities and is currently supported for the formats:

    • JSON

    • Common Log Format (CLF) - like those from NGNIX and Kong

    • logfmt

    groundcover automatically detects the format of these logs, extracting key:value pairs from the original log records as Attributes.

    Each attribute can be added to your filters and search queries.

    Example: filtering a log in a supported format with a field of a request path "/status" will look as follows: @request.path:"/status". Syntax can be found here.

    hashtag
    Configuration

    groundcover offers the flexibility to craft tailored collection filtering rules, you can choose to set up filters and collect only the logs that are essential for your analysis, avoiding unnecessary data noise. For guidance on configuring your filters, explore our Customize Logs Collection section.

    You also have the option to define the retention period for your logs in the ClickHouse database. By default, logs are retained for 3 days. To adjust this period to your preferences, visit our Customize Retention section for instructions.

    hashtag
    Log Explorer

    Once logs are collected and ingested, they are available within the groundcover platform in the Log Explorer, which is designed for quick searches and seamless exploration of your logs data. Using the Log Explorer you can troubleshoot and explore your logs with advanced search capabilities and filters, all within a clear and fast interface.

    hashtag
    Search and filter

    The Log Explorer integrates dynamic filters and a versatile search functionality that enables you to quickly and easily identify the right data. You can filter out logs by selecting one or multiple criteria, including log-level, workload, namespace and more, and can limit your search to a specific time range.

    Learn more about how to use our search syntaxes

    hashtag
    Log Pipelines

    groundcover natively supports setting up log pipelines using Vector transforms.arrow-up-right This allow for full flexibility in the processing and manipulation of logs being collected - parsing additional patterns by regex, renaming attributes, and many more.

    Learn more about how to configure log pipelines

    subscription costsarrow-up-right

    Supported Technologies

    groundcover will work out-of-the-box on all protocols, encryption libraries and runtimes below - generating traces and metrics with zero code changes.

    circle-check

    We're growing our coverage all the time. Cant find what you're looking for? let us know over Slack.arrow-up-right

    hashtag
    Supported protocols

    Protocol
    Status
    Comments

    hashtag
    Supported encryption libraries and runtimes

    groundcover seamlessly supports APM for encrypted communication - as long as it's listed below.

    Encryption Library/Runtime
    Status
    Comments
    circle-exclamation

    Encryption is unsupported for binaries which have been compiled without debug symbols ("stripped"). Known cases:

    • Crossplane

    groundcover MCP

    Supercharge your AI agents with o11y superpowers using the groundcover MCP server. Bring logs, traces, metrics, events, K8s resources, and more directly into your agent’s context — and troubleshoot side by side with your AI assistant to solve issues in minutes.

    Status: Work in progress. We keep adding tools and polishing the experience. Got an idea or question? Ping us on Slack!

    hashtag
    What is MCP?

    MCP (Model Context Protocol) is an open standard that enables AI tools to access external data sources like APIs, observability platforms, and documentation directly within their working context. MCP servers expose these resources in a consistent way, so the AI agent can query and use them as needed to streamline workflows.

    groundcover’s MCP server brings your live observability data into the picture - making your agents smarter, faster, and more accurate.

    hashtag
    How Can groundcover's MCP Help You and Your Agent?

    By connecting your agent to groundcover’s MCP server, you enable it to:

    • Query live logs, traces, and metrics for a workload, container, or issue.

    • Run root cause analysis (RCA) on issues, right inside your IDE or chat interface.

    • Auto-debug code with observability context built in.

    See examples in our and .

    hashtag
    Install groundcover's MCP Server

    Set up is quick and agent-friendly. We support both OAuth (recommended) and API key flows.

    Head to for setup instructions and client-specific guides.

    Installation & Updating

    Multiple ways to connect your infrastructure and applications to groundcover

    groundcover is designed to support data ingestion from multiple sources, giving you comprehensive observability across your entire stack. Choose the installation method that best fits your infrastructure and monitoring needs.

    hashtag
    Available Installation Options

    hashtag
    Kubernetes Clusters

    Connect your Kubernetes clusters using groundcover's eBPF-based sensor for automatic instrumentation and deep observability.

    • - Deploy groundcover's sensor to monitor containerized workloads, infrastructure, and applications with zero code changes

    hashtag
    Standalone Linux Hosts

    Monitor individual Linux servers, virtual machines, or cloud instances outside of Kubernetes.

    • - Install groundcover on standalone Linux hosts such as EC2 instances, bare metal servers, or VMs

    hashtag
    Real User Monitoring (RUM)

    Gain visibility into frontend performance and user experience with client-side monitoring.

    • - Monitor real user interactions, page loads, and frontend performance in web applications

    hashtag
    External Data Sources

    Integrate with existing observability tools and send data from your current monitoring stack.

    • - Forward traces, metrics, and logs from existing OpenTelemetry collectors

    • - Send data from Datadog agents while maintaining your existing setup

    hashtag
    Getting Started

    1. - Set up your groundcover account and workspace

    2. Review - Ensure your environment meets the necessary prerequisites

    3. Choose your installation method - Select the option that matches your infrastructure setup

    hashtag
    Need Help?

    If you're unsure which installation method is right for you, or if you have specific requirements, check our or reach out to our support team.

    Requirements

    To ensure a seamless experience with groundcover, it's important to confirm that your environment meets the necessary requirements. Please review the detailed requirements for Kubernetes, our eBPF sensor, and the necessary hardware and resources to guarantee optimal performance.

    hashtag
    Kubernetes requirements

    groundcover supports a wide range of Kubernetes versions and distributions, including popular platforms like EKS, AKS, and GKE.

    Learn more ->

    hashtag
    Kernel requirements for eBPF sensor

    Our state-of-the-art eBPF sensor leverages advanced kernel features to deliver comprehensive monitoring with minimal overhead, requiring specific Linux kernel versions, permissions, and CO:RE support.

    hashtag
    Hardware and resource requirements

    groundcover fully supports both x86 and ARM processors, ensuring compatibility across diverse environments.

    hashtag
    ClickHouse resources

    groundcover operates ClickHouse to support many of its core features. This requires suitable resources given to the deployment, which groundcover takes care of according to your data usage.

    Monitor List page

    View, filter, and manage all monitors in one place, and quickly identify issues or create new monitors.

    The Monitor List is the central hub for managing and monitoring all active and configured Monitors. It provides a clear, filterable table view of your Monitors, with their current status and key details, such as creation date, severity, and live issues. Use this page to review your Monitors performance, identify issues, and take appropriate action.

    hashtag
    Key Features

    hashtag
    Monitors Table

    • Displays the following columns:

      • Name: Title of the monitor.

      • Creation Date: When the monitor was created.

    hashtag
    Create Monitor

    You can create a new Monitor by clicking on Create Monitor, then choosing between the different options: Monitor Wizard, Monitor Catalog, or Import. For further guidance, .

    hashtag
    Filters Panel

    Use filters to narrow down monitors by:

    • Severity: S1, S2, S3, or custom severity levels.

    • Status: Alerting or Normal.

    • Silenced: Exclude silenced monitors.

    Tip: Toggle multiple filters to refine your view.

    hashtag
    Search Bar

    Quickly locate monitors by typing a name, status, category, or other keywords.

    hashtag
    Cluster and Environment Filters

    Located at the top-right corner, use these to focus on monitors for specific clusters or environments.

    Monitors

    Monitors offers the ability to define custom alerts, which you can configure using groundcover data and custom metrics.

    hashtag
    What is a Monitor

    A Monitor defines a set of rules and conditions that track the state of your system. When a monitor's conditions are met, it triggers an issue that is displayed on the Issues page and can be used for alerting using your integrations and workflows.

    Easily create a new Monitor by using our guide.

    Query Logs

    The following pages provide examples and guidance on how to query logs in Groundcover for different use cases:

    • The basics of querying logs – Learn how to run simple log queries and understand the core query structure.

    • Pagination in log queries – Learn how to paginate through large result sets efficiently.

    • – Explore more advanced query patterns and techniques for complex scenarios.

    5 quick steps to get you started

    Once installed, we recommend following these steps to help you quickly gain the most our of groundcover's unique observability platform.

    hashtag
    1. Get a complete view of your workloads

    The "Home page" of the groundcover app is our Workloads page. From here, you can get a service-centric view,

    Full Webhook Examples

    This section contains comprehensive examples of webhook integrations with various third-party services. These examples provide step-by-step instructions for setting up complete workflows with external systems.

    hashtag
    Available Examples

    • - Integrate with incident.io for incident management

    Migrations

    Automated migration from legacy vendors. Bring over your monitors, dashboards, data sources, and all the data you need - with automatic mapping and zero downtime.

    hashtag
    Overview

    groundcover is the first observability platform to ship a one-click migration tool from legacy vendors. The migration flow automatically discovers, translates, and installs your observability setup into groundcover.

    Goal: Move your entire observability stack with zero manual work. We don't just migrate assets - we bring the data and handle all the mapping for you.

    Metrics and Logs API

    This page describes the available API endpoints for querying logs and metrics in groundcover, including how to authenticate and structure requests for general data retrieval.

    hashtag
    Authentication

    Authentication is performed using an API key generated in the of the Groundcover console.

    All API requests must include the API key in the Authorization header using the following format:

    Grafana Service Account Token

    hashtag
    Step 1 - generate Grafana Service Account Token

    • Make sure you have

    Dashboards

    Learn how to build custom dashboards using groundcover

    groundcover’s dashboards are designed to personalize your data visualization and maximize the value of your existing data. Dashboards are perfect for creating investigation flows for critical monitors, displaying the data you care about in a way that suits you and your team, and crafting insights from the data on groundcover.

    Easily create a new Dashboard .

    hashtag
    Key Features

    Insights

    Quickly understand your data with groundcover

    groundcover insights give you a clear snapshot of notable events in your data. Currently, the platform supports Error Anomalies, with more insight types on the way.

    hashtag
    Error Anomalies

    Error Anomalies instantly highlight workloads, containers, or environments experiencing unusual spikes in Error or Critical logs, as well as Traces marked with an error status. These anomalies are detected using statistical algorithms, continuously refined through user feedback for accuracy.

    Drilldown

    The Drilldown view helps you to quickly identify and highlight the most informative attributes - those that stand out and help you pinpoint anomalies or bottlenecks.

    hashtag
    Distribution Mode

    In this mode, groundcover showcases the top attributes found in your traces or logs data. Each attribute displays up to four values with the highest occurrence across the selected traces.

    You can click any value to add or exclude it as a filter and continue drilling down interactively.

    Dashboards

    groundcover enables access to an embedded Grafana, within the groundcover platform's interface. This enables you to easily import and continue using your existing Grafana dashboards and .

    The following guides will help you setup and import your visualizations from Grafana:

    Alerts

    groundcover enables access to an embedded Grafana, within the groundcover platform's interface. This enables you to easily import and continue using your existing Grafana and alerts.

    API Examples

    Welcome to the API examples section. Here, you’ll find practical demonstrations of how to interact with our API endpoints using cURL commands. Each example is designed to help you quickly understand h

    hashtag
    Structure of the Examples

    • cURL-based examples: Every example shows the exact cURL command you can copy and run directly in your terminal.

    hashtag
    What we migrate

    hashtag
    Monitors

    Includes alert conditions, thresholds, and evaluation windows.

    hashtag
    Dashboards

    Complete dashboard migration with preserved layouts:

    • All widget types groundcover supports

    • Query translations

    • Time ranges and filters

    • Visual settings and arrangements

    hashtag
    Data Sources

    We detect what you're using in Datadog and help you set it up in groundcover. One-click migration for integrations is coming soon.

    hashtag
    Data & Mapping

    We don't just copy configurations - we ensure the data flows:

    • Automatic metric mapping: Datadog metric names translated to groundcover equivalents

    • Label translation: Tags become labels with intelligent mapping

    • Query conversion: Datadog query syntax converted to groundcover

    • Data validation: We verify all referenced metrics and data sources exist

    hashtag
    Supported providers

    hashtag
    Datadog

    Full migration support available now.

    Migrate from Datadog →

    hashtag
    Other providers

    Additional vendors coming soon.

    hashtag
    How it works

    Three steps. No scripts. No downtime.

    1. Fetch & discover: Provide API keys. groundcover pulls your monitors, dashboards, and data sources.

    2. Automatic setup: We install data sources, map metrics, and prepare everything.

    3. Migrate assets: Review and migrate monitors and dashboards with one click.

    API keys are not stored.

    hashtag
    Access

    Settings → Migrations (Admin role required)

    hashtag
    What's next

    The migration flow is structured to support additional asset types:

    • Data source configurations (available now)

    • Log pipelines (coming soon)

    • Advanced metric mappings (coming soon)

    hashtag
    How attributes are selected

    We use statistical scoring based on:

    • Entropy: how diverse the values of an attribute are.

    • Presence ratio: how often the attribute appears across the selected traces.

    Attributes that are both common and have high entropy are prioritized.

    Endpoint-specific demonstrations: We walk through different API endpoints one by one, highlighting the required parameters and common use cases.
  • Request & Response clarity: Each section contains both the request (what you send) and the response (what you get back) to illustrate expected behavior.

  • hashtag
    Prerequisites

    Before running any of the examples, make sure you have:

    1. API Keyarrow-up-right

    2. Backend IDarrow-up-right

    Monitor deployed code and validate fixes without switching tools.
    Getting-started Prompts
    Real-world Use Cases
    Configure groundcover's MCP Server
    Follow the 5 quick steps - Get oriented with groundcover's interface and features
    Connect Kubernetes clusters
    Connect Linux hosts
    Connect RUM
    Ship from OpenTelemetry
    Ship from Datadog Agent
    Login and Create a Workspace
    Requirements
    FAQ
    Learn more ->
    Learn more ->
    Advanced log querying use cases

    MS Teams - Send notifications to Microsoft Teams channels

  • Email via Zapier - Route alerts to email using Zapier

  • Slack App with Bot Tokens - Route alerts to different slack channels with a single Webhook

  • Each example includes:

    • Prerequisites and setup requirements

    • Step-by-step configuration instructions

    • Complete workflow YAML configurations

    • Integration-specific considerations and best practices

    These examples demonstrate advanced webhook usage patterns and can serve as templates for other webhook integrations.

    incident.io
    alerts
    Create a Grafana dashboard
    Build alerts & dashboards with Grafana Terraform provider
    Using groundcover as Prometheus/Clickhouse database in a Self-hosted Grafana
    dashboards
    Live Issues: Number of live issues currently firing.
  • Status: Is the Monitor "Firing" (alerts active) or "Normal" (no alerts).

  • Tip: Click on a Monitor name to view its detailed configuration and performance metrics.

  • check out our guide
    Monitors Table
    Example of a groundcover Monitor

    Multi-Mode Query Bar: The Query Bar is central to dashboards and supports multiple modes fully integrated with native pages and Monitors. Currently, the modes include Metrics, Infra Metrics, Logs, and Traces. Learn more in the Query Builder section.

  • Variables: Built-in variables allow you to filter data quickly based on a predefined list crafted by groundcover.

  • Widget Types: Two widget types are currently supported:

    • Chart Widget: Displays data visually.

    • Textual Widget: Adds context to your dashboards.

  • Display Types: Five display types are supported for data visualization:

    • Time Series, Table, Stat, Top List, and Pie. Read more in the Widget Types section.

  • using our guide
    Example of a groundcover Dashboard
    Each insight surfaces trends based on the entity’s error signals (e.g., workload, container, etc.):

    On Logs, anomalies are based on logs filtered by level:error or level:critical, and grouped by:

    • workload

    • container

    • namespace

    • environment

    • cluster

    On Traces, anomalies are based on traces filtered by status:error, and grouped by a more granular set of dimensions:

    • protocol_type

    • return_code

    • role (client/server)

    • workload

    • container

    • namespace

    • environment

    • cluster

    Logs Insights
    hashtag
    Raw Prometheus API for Metrics

    Use the following endpoint to query the groundcover Prometheus API:

    To see usage example on how to query metrics in groundcover see: Query metrics examples

    For a complete Prometheus REST API documentation, and available operations, see: Prometheus HTTP APIarrow-up-right

    hashtag
    Logs API

    Use the following endpoint to query the groundcover logs API:

    To see usage examples on how to query logs in groundcover see: Query logs examples

    hashtag
    Legacy API (Clickhouse)

    The following configurations are deprecated but may still be in use in older setups.

    circle-exclamation

    The legacy datasources us using a different API key than described above. The API key can be obtained by running: groundcover auth get-datasources-api-key

    You can query the legacy API to execute SQL statements directly on the Clickhouse database.

    Try the following structure:

    API Keys section

    Generate service account token

    circle-exclamation

    Service Account Token are only accessible once, so make sure you keep them somewhere safe, running the command again will generate a new service account token

    circle-exclamation

    Only groundcover tenant admins can generate Service Account Tokens

    hashtag
    Step 2 - Use Grafana Terraform provider

    • make sure you have Terraform installedarrow-up-right

    • Use the official Grafana Terraform providerarrow-up-right with the following attributes

    Continue to create Alerts and Dashboards in Grafana, see: Build alerts & dashboards with Grafana Terraform provider.

    You can read more about what you can achieve with the Grafana Terraform provider in the official docsarrow-up-right

    groundcover cli installed
    Authorization: Bearer <YOUR_API_KEY>
    https://app.groundcover.com/api/prometheus/api/v1/query
    https://app.groundcover.com/api/logs/v2/query
    curl https://ds.groundcover.com/ \
        -H "X-ClickHouse-Key: <DS-API-KEY-VALUE>" \
        --data "SELECT count() FROM logs LIMIT 1 FORMAT JSON" 
    groundcover version
    groundcover auth generate-service-account-token
    terraform {
      required_providers {
        grafana = {
          source = "grafana/grafana"
        }
      }
    }
    
    provider "grafana" {
      url  = "https://app.groundcover.com/grafana"
      auth = "{service account token}"
    }

    Redis

    supported

    DNS

    supported

    Kafka

    supported

    MongoDB

    supported

    v3.6+

    AMQP

    supported

    AMQP 0-9-1

    GraphQL

    supported

    AWS S3

    supported

    AWS SQS

    supported

    HTTP

    supported

    gRPC

    supported

    MySQL

    supported

    PostgreSQL

    supported

    crypto/tls (golang)

    supported

    OpenSSL (c, c++, Python)

    supported

    NodeJS

    supported

    JavaSSL

    supported

    Java 11+ is supported. Requires

    hashtag
    2. Check out full payloads of traces

    A highly impactful advantage of leveraging eBPF in our proprietary sensor is that it enables visibility on the full payloads of both request and response - including headers! This allows you to very quickly understand issues and provides context.

    Go to Traces →arrow-up-right

    hashtag
    3. Build a native dashboard

    groundcover enables to very easily build custom dashboards to visualize your data using our intuitive Query Builder as a guide, or using your own queries.

    Go to Dashboards →arrow-up-right

    hashtag
    4. Set up a Monitor

    Define custom alerts using our native Monitors, which you can configure using groundcover data and custom metrics. You can also choose from our Monitors Catalog, which contains multiple pre-built Monitors that cover a few of the most common use cases and needs.

    Go to Monitors →arrow-up-right

    hashtag
    5. Invite your team

    Invites lets you share your workspaces with your colleagues in just a couple of clicks. You can find the "Invite Members" option at the bottom of the left navigation bar. Type in the email addresses of the teammates you want to invite, and set their user permissions (Admin, Editor, Read Only), then click "Send Invites".

    Go to Workloads →arrow-up-right

    Issues page

    View and analyze monitor issues with detailed timelines, metadata, and context to quickly identify and resolve problems in your environment.

    The Issues page provides a detailed view of active and resolved issues triggered by Monitors. This page helps users investigate, analyze, and resolve problems in their environment by visualizing issue trends and providing in-depth context through an issue drawer.

    hashtag
    Issues drawer

    Clicking on an issue in the Issues List opens the Issue drawer, which provides an in-depth view of the Monitor and its triggered issue. You can also navigate if possible to related entities like workload, node, pod, etc.

    hashtag
    Details tab

    Displays metadata about the issue, including:

    • Monitor Name: Name of the Monitor that triggered the issue, including a link to it.

    • Description: Explains what the Monitor tracks and why it was triggered.

    • Severity: Shows the assigned severity level (e.g., S3).

    hashtag
    Events tab

    Displays the Kubernetes events related to the selected issue within the timeframe selected in the Time Picker dropdown (upper right of the issue drawer).

    hashtag
    Traces tab

    When creating a Monitor using a traces query, the Traces tab will display the matching traces generated within the timeframe selected in the Time Picker dropdown (upper right of the issue drawer). Click on "View in Traces" to navigate to the Traces section with all relevant filters automatically applied.

    hashtag
    Logs tab

    When creating a monitor using a log query, the Logs tab will display the matching logs generated within the timeframe selected in the Time Picker dropdown (upper right of the issue drawer). Click on "View in Logs" to navigate to the Logs section with all relevant filters automatically applied.

    hashtag
    Map tab

    A focused visualization of the interactions between workloads related to the selected issue.

    Remote Access & APIs

    groundcover has various authentication key types for remotely interacting with our platform, whether to ingest observability data or to automate actions via our APIs:

    1. API Keys- An API key in groundcover provides secure, programmatic access to the API on behalf of a service account. It inherits that account’s permissions and should be stored safely. This is also the key you need when working groundcover’s terraform provider. See:

      1. groundcover's APIs documentationarrow-up-right.

      2. groundcover's Terraform provider: .

    2. - Ingestion Keys let sensors, integrations and browsers send observability data to your groundcover backend. These keys are the counterpart of API Keys, which are optimized for reading data or automating dashboards and monitors.

    3. - A key used to connect to groundcover as a datasource, querying Clickhouse and VictoriaMetrics directly.

    4. - Used to remotely configure create Grafana Alerts & Dashboards via Terraform.

    \

    Workflows

    Workflows are YAML-based configurations that are executed whenever a monitor is triggered. They enable you to integrate with third-party systems, apply custom logic to format and transform data, and set different conditions to handle your monitoring alerts intelligently.

    hashtag
    Workflow components

    hashtag
    Triggers

    Triggers apply filtering conditions that determine whether a specific workflow is executed. In groundcover, the trigger type is always set to "alert".

    Example: This trigger ensures that only monitors fired with telemetry data from the Prod environment will actually execute the workflow. Note that the "env" attribute needs to be provided as a context label from the monitor:

    Note: Workflows are "pull based" which means they will try to match monitors even when these monitors did not explicitly add a specific workflow. Therefore, the filters need to accurately define the condition to be used for a monitor.

    hashtag
    Consts

    Consts is a section where you can declare predefined attributes based on data provided with the monitor context. A set of functions is available to transform existing data and format it for propagation to third-party integrations. Consts simplify access to data that is needed in the actions section.

    Example: The following example shows how to map a predefined set of severity values to the monitor severity as defined in groundcover - here, any potential severity in groundcover is translated into one of P1-P5 values.

    The function keep.dictget gets a value from a map (dictionary) using a specific key. In case the key is not found - P3 will be the default severity:

    hashtag
    Actions

    Actions specify what happens when a workflow is triggered. Actions typically interface with external systems (like sending a Slack message). Actions can be an array of actions, they can be executed conditionally and include the integration in their config part as well as a payload block which is typically dependent on the exact integration used for the notification.

    Actions include:

    1. Provider part (provider:) - Configures the integration to be used

    2. Payload part (with:) - Contains the data to submit to the integration based on its actual structure

    Example: In this example you can see a typical Slack notification. Note that the actual integration is referenced through the 'providers' context attribute. The integration name is the exact string used to (in this case "groundcover-alerts").

    Silences page

    Manage and create silences to suppress Monitor notifications during maintenance or specific periods, reducing noise and focusing on critical issues.

    hashtag
    Overview

    The Silences page lists all Silences you and your team created for your Monitors. In this section, you can also create and manage your Silence rules, to suppress notifications and Issues noise, for a specified period of time. Silences are a great way to reduce alert fatigue, which can lead to missing important issues, and help focus on the most critical issues during specific operational scenarios such as scheduled maintenances.

    hashtag
    Create a new Silence

    Follow these simple steps to create a new Silence.

    hashtag
    Section 1: Schedule

    Specify the timeframe for the silence rule. Note that the starting point doesn't have to be now, and can also be any time in the future.

    Below the From / Until boxes, you'll see a Silence summary, showing its approximate length (rounded down to full days) and starting date.

    hashtag
    Section 2: Matchers

    Define the criteria for Monitors or Issues to be silenced.

    1. Click Add Matcher to specify match conditions (e.g., cluster, namespace, span_name).

    2. Combine multiple matchers for more granular control.

    Example: Silence all Monitors in the "demo" namespace.

    hashtag
    Section 3 - Affected Active Issues

    Preview the issues currently affected by the Silence rule, based on any defined Matchers. This list contains only actively firing Issues.

    Tip: Us this preview to see the list of impacted issues and adjust your Matchers before finishing to create the Silence.

    hashtag
    Section 4: Comment

    Add notes or context for the Silence rule. These comments help you and other users understand the purpose of the rule.

    API Keys

    An API key in groundcover provides secure, programmatic access to the API on behalf of a service account. It inherits that account’s permissions and should be stored safely.

    An API key in groundcover provides secure, programmatic access to the API on behalf of a service account. It inherits that account’s permissions and should be stored safely

    hashtag
    Binding and Permissions

    Each API key is tied to a specific service account. It inherits the permissions defined by that account’s RBAC policies. Optionally, the key can be limited to a subset of those policies for more granular access control. An API key can never exceed the permissions of its parent service account.

    hashtag
    Creation and Storage

    Only Admins can create or revoke API keys. To create an API key:

    1. Navigate to the Settings page using the settings button located in the bottom left corner

    2. Select "Access" from the sidebar menu

    3. Click on the "API Keys" tab

    When a key is created, its value is shown once—store it securely in a secret manager or encrypted environment variable. If lost, a new key must be issued.

    hashtag
    Authentication and Usage

    To use an API key, send it in the Authorization header as bearer token:

    The key authenticates as the service account, and all API permissions are enforced accordingly.

    API Key authentication will work using https://api.groundcover.com/ only.

    hashtag
    Validity and Revocation

    API keys do not expire automatically. Revoking a key immediately disables its access.

    hashtag
    Scope of Use

    API keys are valid only for requests to https://api.groundcover.com. They do not support data ingestion or Grafana integration—those require dedicated tokens.

    hashtag
    API Keys vs Ingestion Keys

    hashtag
    Security Best Practices

    Store securely: Use secrets managers like AWS Secrets Manager or HashiCorp Vault. Never commit keys to source control.

    Follow least privilege: Assign the minimal required policies to service accounts and API keys. Avoid defaulting to admin-level access.

    Rotate regularly: Periodically generate new keys, update your systems, and revoke old ones to limit exposure.

    Revoke stale keys: Remove keys that are no longer in use or suspected to be compromised.

    Create a new Workflow

    hashtag
    Creation

    Creating new workflows is currently supported through the groundcover app in two ways from the Monitors menu:

    hashtag
    1. "Create Notification Workflow" button - The quick way

    This provides a guided approach to create a workflow. When creating a notification workflow, you will be asked to give your workflow a name, add filters, and select the specific integration to use.

    Filter Rules By Labels - Add key-value attributes to ensure your workflow executes under specific conditions only - for example, env = prod only.

    Delivery Destinations - Select one or more integrations to be used for notifications with this workflow.

    Scope - When The Workflow Will Run - This setting allows you to limit this workflow execution only to monitors that explicitly select to route their triggers to this workflow, as opposed to "Handle all issues" that catches triggers from any monitor.

    Once you create a workflow using this option, you can later edit the workflow to apply any configuration or logic by using the editor option (see next).

    hashtag
    2. "Create Workflow" button

    Clicking the button will open up a text editor where you can add your workflow definition in YAML format by applying any valid configuration, logic, and functionality.

    Note: Make sure to create your integration prior to creating the workflow as it requires using an existing integration.

    hashtag
    View

    Upon successful workflow creation it will be active immediately, and a new workflow record will appear in the underlying table.

    For each existing workflow, we can see the following fields:

    • Name: Your defined workflow name

    • Description: If you've added a description of the workflow

    • Creator: Workflow creator email

    hashtag
    Editing

    From the right side of each workflow record in the display, you can access the menu (three dots) and click "Edit Workflow". This will open the editor so you can modify the YAML to conform to the available functionality. See examples below.

    Create a Grafana dashboard

    circle-info

    The following guide explains how to build dashboards within the groundcover platform using our fully integrated Grafana interface. To learn how you can create dashboards using Grafana Terraform, follow this guide.

    A dashboard is a great tool for visually tracking, analyzing, and displaying key performance metrics, which enable you to monitor the health of your infrastructure and applications.

    hashtag
    Creating a new dashboard

    1️⃣ Go to the Dashboards tab in the groundcover app, and click New and then New Dashboard.

    2️⃣ Create your first panel by clicking Add a new panel.

    3️⃣ In the New panel view, go to the Query tab.

    4️⃣ Choose your data source by pressing the -- Grafana -- on the data source selector. You would see your the metrics collected from each of your clusters an a Prometheus data source called Prometheus@<cluster-name>

    5️⃣ Create your first Query in the PromQL query interface.

    circle-info

    Learn more about and to improve your skills. For any help in creating your custom dashboard don't hesitate to .

    circle-info

    Tips: \

    • Learn more about the supported metrics you can use to build dashboards in the section under and page.\

    • groundcover has a set of example dashboards in the

    Real User Monitoring (RUM)

    Monitor front-end applications and connect it to your backend — all inside your cloud.

    circle-info

    This capability is only available to BYOC and on-prem deployments. Check out our for more information about subscription plans and the available deployment modes.

    Capture real end-user experiences directly from their browsers and unify these insights with your backend observability data.

    ➡️ Check out our to your platform.

    Get Logs Pipeline Configuration

    Retrieve the current logs pipeline configuration.

    hashtag
    Endpoint

    GET /api/pipelines/logs/config

    hashtag

    Login and Create a Workspace

    Get started with groundcover

    circle-check

    This is the first step to start with groundcover for all types of plans 🚀

    hashtag
    Sign up to groundcover

    Datasource API Keys

    groundcover provides a robust user interface that allows you to view and analyze all your observability data from inside the platform. However, there may be cases in which you need to query the data from outside our platform using API communication.

    Our proprietary eBPF sensor automatically captures granular observability data, which is stored via our integrations with two best-of-breed technologies. VictoriaMetrics for metrics storage, and ClickHouse for storage of logs, traces, and Kubernetes events.

    Read more about our architecture .

    hashtag
    Generate the API key

    Update Logs Pipeline Configuration

    Update the logs pipeline configuration.

    hashtag
    Endpoint

    POST /api/pipelines/logs/config

    hashtag

    Getting-started Prompts

    Once your MCP server is connected, you can dive right in.

    Here are a few prompts to try. They work out of the box with agents like Cursor, Claude, or VS Code:

    💡 Starting your request with “Use groundcover” is a helpful nudge - it pushes the agent toward MCP tools and context.

    hashtag
    Basic Prompts to Try

    Delete Workflow

    Delete an existing workflow using workflow id

    hashtag
    Endpoint

    DELETE /api/workflows/{id}

    hashtag

    Build alerts & dashboards with Grafana Terraform provider

    hashtag
    Configure the Grafana Terraform provider

    For instructions on how to generate a Grafana Service Account Token and use it in the Grafana Terraform provider, see: .

    hashtag

    https://github.com/groundcover-com/terraform-provider-groundcoverarrow-up-right
    Ingestion Keys
    Datasources (ds) API Key
    Grafana Service Account Token
    enabling the groundcover Java agent
    Labels: Lists contextual labels like cluster, namespace, and workload.
  • Creation Time: Shows when the issue started firing.

  • Creation Date: Workflow creation date
  • Last Execution Time: Timestamp of last workflow execution (depends on workflow trigger type)

  • Last Execution Status: Last execution status (failure or success)

  • Create Workflow
    create the integration
    Create a new API Key, ensuring you assign it to a service account that is bound to the appropriate RBAC policy

    Data stops flowing immediately

    API calls fail

    Ingestion Key

    API Key

    Primary purpose

    Write data (ingest)

    Read data / manage resources via REST

    Permissions capabilities

    Write‑only + optional remote‑config read

    Mirrors service‑account RBAC

    Visibility after creation

    Always revealable

    Shown once only

    Typical lifetime

    Tied to integration lifecycle

    Rotated for CI/CD automations

    Revocation effect

    Authentication

    This endpoint requires API Key authentication via the Authorization header.

    hashtag
    Headers

    hashtag
    Examples

    hashtag
    Basic Request

    Get current logs pipeline configuration:

    hashtag
    Response Example

    hashtag
    Related Documentation

    For detailed information about configuring and writing OTTL transformations, see:

    • Log Parsing with OpenTelemetry Pipelinesarrow-up-right

    Run the following command in your CLI, and select tenant:

    groundcover auth get-datasources-api-key

    hashtag
    Querying ClickHouse

    Example for querying ClickHouse database using POST HTTP Request:

    hashtag
    Command parameters

    • X-ClickHouse-Key (header): API Key you retrieved from the groundcover CLI. Replace ${API_KEY} with your actual API key, or set API_KEY as env parameter.

    • SELECT count() FROM traces WHERE start_timestamp > now() - interval '15 minutes' (data): The SQL query to execute. This query counts the number of traces where the start_timestamp is within the last 15 minutes.

    Learn more about the ClickHouse query language herearrow-up-right.

    hashtag
    Querying VictoriaMetrics

    Example for querying the VictoriaMetrics database using the query_rangearrow-up-right API:

    hashtag
    Command parameters

    • apikey (header): API Key you retrieved from the groundcover CLI. Replace ${API_KEY} with your actual API key, or set API_KEY as env parameter.

    • query (data): The promql query to execute. In this case, it calculates the sum of the rate of groundcover_resource_total_counter with the type set to http.

    • start (data): The start timestamp for the query range in Unix time (seconds since epoch). Example: 1715760000.

    • end (data): The end timestamp for the query range in Unix time (seconds since epoch). Example: 1715763600.

    Learn more about the promql syntax herearrow-up-right.

    Learn more about VictoriaMetrics HTTP API herearrow-up-right.

    herearrow-up-right
    Authentication

    This endpoint requires API Key authentication via the Authorization header.

    hashtag
    Headers

    hashtag
    Examples

    hashtag
    Basic Request

    Update logs pipeline configuration with test pattern:

    hashtag
    Response Example

    🚨 CRITICAL WARNING: This endpoint COMPLETELY REPLACES the entire pipeline configuration and WILL DELETE ALL EXISTING RULES. Always backup your current configuration first by calling the GET endpoint.

    hashtag
    Backup Current Configuration First

    ALWAYS get your current configuration before making changes:

    hashtag
    Verify Configuration Update

    After updating the configuration, verify the patterns were added:

    This should return your updated configuration including the new test patterns.

    hashtag
    Related Documentation

    For detailed information about configuring and writing OTTL transformations, see:

    • Log Parsing with OpenTelemetry Pipelinesarrow-up-right

    Authentication

    This endpoint requires API key authentication.

    hashtag
    Headers

    Header
    Value
    Description

    Authorization

    Bearer <YOUR_API_KEY>

    Your groundcover API key

    Accept

    */*

    Accept any response format

    hashtag
    Path Parameters

    Parameter
    Type
    Description

    id

    string

    The UUID of the workflow to delete

    hashtag
    Example Request

    hashtag
    Response

    hashtag
    Success Response

    Status Code: 200 OK

    hashtag
    Notes

    • Once a workflow is deleted, it cannot be recovered

    • The deletion is immediate and permanent

    • All associated workflow executions and history are also removed

    • The API returns HTTP 200 status code for both successful deletions and "not found" cases

    triggers:
      - type: alert
        filters:
        - key: env
          value: prod
    consts:
        severities: '{"S1": "P1","S2": "P2","S3": "P3","S4": "P4","critical": "P1","error": "P2","warning": "P3","info": "P4"}'
        severity: keep.dictget({{ consts.severities }}, {{ alert.annotations._gc_severity }}, "P3")
    actions:
    - name: slack-action-firing
      provider:
        config: '{{ providers.groundcover-alerts }}'
        type: slack
        with:
          attachments:
          - color: '{{ consts.red_color }}'
            footer: '{{ consts.footer_url }}'
            text: '{{ consts.slack_message }}'
            title: 'Firing: {{ alert.alertname }}'
            type: plain_text
          message: ' '
    curl 'https://api.groundcover.com/api/k8s/v3/clusters/list' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'X-Backend-Id: <YOUR_BACKEND_ID>' \
      --data-raw '{"sources":[]}'
    Authorization: Bearer <YOUR_API_KEY>
    Accept: */*
    curl -L \
      --url 'https://api.groundcover.com/api/pipelines/logs/config' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Accept: */*'
    {
      "ottlRules": [
        {
          "ruleName": "nginx_access_logs",
          "conditions": [
            "workload == \"nginx\" or container_name == \"nginx\""
          ],
          "statements": [
            "set(cache, ExtractGrokPatterns(body, \"^%{IPORHOST:remote_ip} - %{DATA:remote_user} \\[%{HTTPDATE:timestamp}\\] \\\"%{WORD:method} %{DATA:path} HTTP/%{NUMBER:http_version}\\\" %{INT:status} %{INT:body_bytes}\"))",
            "merge_maps(attributes, cache, \"insert\")"
          ],
          "statementsErrorMode": "skip",
          "conditionLogicOperator": "or"
        },
        {
          "ruleName": "json_log_parsing",
          "conditions": [
            "format == \"JSON\""
          ],
          "statements": [
            "set(parsed_json, ParseJSON(body))",
            "merge_maps(attributes, parsed_json, \"insert\")"
          ],
          "statementsErrorMode": "skip",
          "conditionLogicOperator": "and"
        },
        {
          "ruleName": "error_log_enrichment",
          "conditions": [
            "level == \"error\" or level == \"ERROR\""
          ],
          "statements": [
            "set(attributes[\"severity\"], \"high\")",
            "set(attributes[\"needs_attention\"], true)"
          ],
          "statementsErrorMode": "skip",
          "conditionLogicOperator": "or"
        }
      ]
    }
    curl 'https://ds.groundcover.com/' \
            --header "X-ClickHouse-Key: ${API_KEY}" \
            --data "SELECT count() from traces where start_timestamp > now() - interval '15 minutes' "
    curl 'https://ds.groundcover.com/datasources/prometheus/api/v1/query_range' \
        --get \
        --header "apikey: ${API_KEY}" \
        --data 'query=sum(rate(groundcover_resource_total_counter{type="http"}))' \
        --data 'start=1715760000' \
        --data 'end=1715763600'
    Authorization: Bearer <YOUR_API_KEY>
    Content-Type: application/json
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/pipelines/logs/config' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --data '{
        "ottlRules": [
          {
            "ruleName": "test_log_pattern",
            "conditions": [
              "workload == \"test-app\" or container_name == \"test-container\""
            ],
            "statements": [
              "set(cache, ExtractGrokPatterns(body, \"^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:message}\"))",
              "merge_maps(attributes, cache, \"insert\")",
              "set(attributes[\"parsed\"], true)"
            ],
            "statementsErrorMode": "skip",
            "conditionLogicOperator": "or"
          },
          {
            "ruleName": "json_parsing_test",
            "conditions": [
              "format == \"JSON\""
            ],
            "statements": [
              "set(parsed_json, ParseJSON(body))",
              "merge_maps(attributes, parsed_json, \"insert\")"
            ],
            "statementsErrorMode": "skip",
            "conditionLogicOperator": "and"
          }
        ]
      }'
    {
      "uuid": "59804867-6211-48ed-b34a-1fc33827aca6",
      "created_by": "itamar",
      "created_timestamp": "2025-08-31T13:33:27.364525Z",
      "value": "ottlRules:\n  - ruleName: test_log_pattern\n    conditions:\n      - workload == \"test-app\" or container_name == \"test-container\"\n    statements:\n      - set(cache, ExtractGrokPatterns(body, \"^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:message}\"))\n      - merge_maps(attributes, cache, \"insert\")\n      - set(attributes[\"parsed\"], true)\n    statementsErrorMode: skip\n    conditionLogicOperator: or"
    }
    curl -L \
      --url 'https://api.groundcover.com/api/pipelines/logs/config' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Accept: */*' > pipeline-backup.json
    curl -L \
      --url 'https://api.groundcover.com/api/pipelines/logs/config' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Accept: */*'
    curl -L \
      --request DELETE \
      --url 'https://api.groundcover.com/api/workflows/{id}' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Accept: */*'
    {
      "message": "OK"
    }
    Dashboards by groundcover
    folder which can get you started. These dashboard are read-only but you can see the PromQL query behind each panel by right-clicking the panel and then
    Explore
    \
    Grafana panelsarrow-up-right
    PromQL queriesarrow-up-right
    join our Slack support channelarrow-up-right
    Infrastructure Monitoring
    Application Metrics
    hashtag
    Overview

    Real User Monitoring (RUM) extends groundcover’s observability platform to the client side, providing visibility into actual user interactions and front-end performance. It tracks key aspects of your web application as experienced by real users, then correlates them with backend metrics, logs, and traces for a full-stack view of your system.

    Understand user experience - capture every interaction, page load, and performance metric from the end-user perspective to pinpoint front-end issues in real time.

    Resolve issues faster - seamlessly tie front-end events to backend traces and logs in one platform, enabling end-to-end troubleshooting of user journeys.

    Privacy first - groundcover’s Bring Your Own Cloud (BYOC) model ensures all RUM data stays in your own cloud environment. Sensitive user data never leaves your infrastructure, ensuring privacy and compliance without sacrificing insight.

    hashtag
    Collection

    groundcover RUM collects a wide range of data from users’ browsers through a lightweight JavaScript SDK. Once integrated into your web application, the SDK automatically gathers and sends the following telemetry from each user session to the groundcover platform:

    • Network requests: Every HTTP request initiated by the browser (such as API calls) is captured as a trace. Each client-side request can be linked with its corresponding server-side trace, giving you a complete picture of the request from the user’s click to the backend response.

    • Front-end logs: Client-side log messages (e.g., console.log outputs, warnings, and errors) are collected and forwarded to groundcover’s log management. This ensures that browser logs are stored alongside your application’s server logs for unified analysis.

    • Exceptions: Uncaught JavaScript exceptions and errors are automatically captured with full stack traces and contextual data (browser type, URL, etc.). These front-end errors become part groundcover monitors, letting you quickly identify and debug issues in the user’s environment.

    • Performance metrics (Core Web Vitals): Key performance indicators like page load time and Core Web Vitals like Largest Contentful Paint, Interaction to Next Paint, and Cumulative Layout Shift are measured for each page view. groundcover RUM records these metrics to help you track real-world performance and detect slowdowns affecting users.

    • User interactions: RUM tracks user interactions such as clicks, keydown, and navigation events. By recording which elements users interact with and when, groundcover helps you reconstruct user flows and understand the sequence of actions leading up to any issue or performance problem.

    • Custom events: You can instrument your application to send custom events via the RUM SDK. This allows you to capture domain-specific actions or business events (for example, a checkout completion or a specific UI gesture) with associated metadata, providing deeper insight into user behavior beyond automatic captures.

    All collected data is streamed securely to your groundcover deployment. Because groundcover runs in your environment, RUM data (including potentially sensitive details from user sessions) is stored in the observability backend within your own cloud. From there, it is aggregated and indexed just like other telemetry, ready to be searched and analyzed in the groundcover UI.

    hashtag
    Full-Stack Visibility

    One of the core advantages of groundcover RUM is its native integration with backend observability data. Every front-end trace, log, or event captured via RUM is contextualized alongside server-side data:

    • Trace correlation: Client-side traces (from browser network requests) are automatically correlated with server-side traces captured by groundcover’s eBPF-based instrumentation. This means when a user triggers an API call, you can see the complete distributed trace that spans the browser and the backend services, all in one view.

    • Unified logging: Front-end log entries and error reports are ingested into the same backend as your server-side logs. In the groundcover Log Explorer, you can filter and search across logs from both client and server, using common fields (like timestamp, session ID, or trace ID) to connect events.

    • End-to-end troubleshooting: With full-stack data in one platform, you can pivot easily between a user’s session replay, the front-end events, and the backend metrics/traces involved. This end-to-end context significantly reduces the time to isolate whether an issue originated in the frontend (browser/UI) or the backend (services/infrastructure), helping teams pinpoint problems faster across the entire stack.

    By bridging the gap between the user’s browser and your cloud infrastructure, groundcover’s RUM capability ensures that no part of the user journey is invisible to your monitoring. This holistic view is critical for optimizing user experience and rapidly resolving issues that span multiple layers of your application.

    hashtag
    Sessions Explorer

    Once RUM data is collected, it becomes available in the groundcover platform via the Sessions Explorer — a dedicated view for inspecting and troubleshooting user sessions. The Sessions Explorer allows you to navigate through user journeys and understand how your users experience your application.

    Clicking on any session opens the Session View, where you can inspect a full timeline of the user’s experience. This view shows every key event captured during the session - including clicks, navigations, network requests, logs, custom events, and errors.

    Each event is displayed in sequence with full context like timestamps, URLs, and stack traces. The Session View helps you understand exactly what the user did and what the system reported at each step, making it easier to trace issues and user flows.

    pricing pagearrow-up-right
    instructions on how to connect RUM
    The first thing you need to do to start using groundcover is sign uparrow-up-right using your email address (no credit card required for the free tier account). Signing up is only possible using a computer and will not be possible using a mobile phone or tablet. It is highly recommended you use your corporate email address, as it will make it easier to use other features such as inviting your colleagues to your workspace. However, signing up using Gmail, Outlook or any other public domains is also possible.

    hashtag
    Workspace Selection

    When signing inarrow-up-right to groundcover for the first time, the platform automatically detects your organization based on the domain you used to sign in. If your organization already has existing workspaces available, the workspace selection screen will be displayed, where you can choose which of the existing workspaces you would like to join, or if you want to create a new workspace.

    Available workspaces will be displayed only if either of the following applies:

    • You have been invited to join existing workspaces and haven't joined them yet

    • Someone has previously created a workspace that has auto-join enabled for the email domain that you used to sign in (applicable for corporate email domains only)

    hashtag
    To join an existing workspace:

    1. Click the Join button next to the desired workspace

    2. You will be added as a user to that workspace with the user privileges that were assigned by default or those that were assigned to you specifically when the invite was sent.

    3. You will automatically be redirected to that workspace.

    hashtag
    To create a new workspace:

    circle-info

    You will only see the option to create a new workspace if you are the first person from your organization to join groundcover.

    1. Click the Create a new workspace button

    2. Specify a workspace name

    3. Choose whether to enable auto-join (those settings can be changed later)

    4. Click continue

    hashtag
    Workspace Auto-joining

    Workspace owners and admins can allow teammates that log in with the same email domain as them to join the Workspace they created automatically, without an admin approval. This capability is called "Auto-join". It is disabled by default, but can be switched on during the workspace set up process, or any time in the workspace settings.

    circle-exclamation

    If you logged in with a public email domain (Gmail, Yahoo, Proton, etc.) and are creating a new Workspace, you will not be able to switch on Auto-join for that Workspace.

    MCP supports complex, multi-step flows, but starting simple is the best way to ramp up.

    hashtag
    Pull Logs

    Prompt:

    Expected behavior: The agent should call query_logs and show recent logs for that workload.

    hashtag
    Get K8s Resource Specs

    Prompt:

    Expected behavior: The agent should call get_k8s_object_yaml and return the YAML or a summary of it.

    hashtag
    Find Slow Workloads

    Prompt:

    Expected behavior: The agent should call get_workloads and return the relevant workloads with their P95 latency.

    hashtag
    Investigate Issues

    When something breaks, your agent can help investigate and make sense of it.

    hashtag
    Paste an Issue Link

    Prompt:

    Expected behavior: The agent should use query_monitors_issues, pull issue details, and kick off a relevant investigation using logs, traces, and metadata.

    hashtag
    Investigate Multiple Issues

    Prompt:

    Expected behavior: The agent should use query_monitors_issues to pull all related issues and start going through them one by one.

    hashtag
    Automate Coding & Debugging

    groundcover’s MCP can also be your coding sidekick. Instead of digging through tests and logs manually, deploy your changes and let the agent take over.

    hashtag
    Iterate Over Test Results

    Prompt:

    Expected behavior: The agent should update the code with log statements, deploy it, and use query_logs to trace and debug.

    hashtag
    Deploy & Verify

    Prompt:

    Expected behavior: The agent should assist with deployment, then check for issues, error logs, and traces via groundcover.

    Dashboard provisioning example
    • Create a directory for the terraform assets

    • Create a main.tf file within the directory that contains the terraform provider configuration mentioned in step 2

    • Create the following dashboards.tf file, this example declares a new Golden Signals folder, and within it a Workload Golden Signals dashboard that will be created

    • add the workloadgoldensignals.json file to the directory as well

    • Run terraform init to initialize terraform context

    • Run terraform plan , you should see a long output describing the assets that are going to be created last line should state Plan: 2 to add, 0 to change, 0 to destroy.

    • Run terraform apply to execute the changes, you should now see a new folder in your grafana dashboards screen with the newly created dashboard

    • Run terraform destroy to revert the changes

    Here is a short video to demonstrate the process

    You can read more about what you can achieve with the Grafana Terraform provider in the official docsarrow-up-right

    Grafana Service Account Token
    file-download
    30KB
    workloadgoldensignals.json
    arrow-up-right-from-squareOpen

    Traces

    hashtag
    Our traces philosophy

    Traces are a powerful observability pillararrow-up-right, providing granular insights into microservice interactions. Traditionally, they were hard to implement, requiring coordination of multiple teams and constant code changes, making this critical aspect very challenging to maintain.

    groundcover's eBPF sensor disrupts the famous tradeoff, empowering developers to gain full visibility into their applications, effortlessly and without any code changes.

    The platform supports two kinds of traces:

    hashtag
    eBPF traces

    These traces are automatically generated for every service in your stack. They are available out-of-the-box and within seconds of installation. These traces always include critical information such as:

    • All services that took part in the interaction (both client and server)

    • Accessed resource

    • Full payloads, including:

    hashtag
    3rd-party traces

    These can be ingested into the platform, allowing to leverage already existing instrumentation to create a single pane of glass for all of your traces.

    Traces are stored in groundcover's ClickHouse deployment, ensuring top notch performance on every scale.

    circle-info

    For more details about ingesting 3rd party traces, see the .

    hashtag
    Sampling

    groundcover further disrupts the customary traces experience by reinventing the concept of sampling. This innovation differs between the different types of traces:

    hashtag
    eBPF traces

    These are generated by using 100% of the data, always processing every request being made, on every scale. However, the groundcover platform utilizes smart sampling to only store a fraction of the traces, while still generating an accurate picture. In general, sampling is performed according to these rules:

    • Requests with unusually high or low latencies, measured per resource

    • Requests which returned an error response (e.g 500 status code for HTTP)

    • "Normal" requests which form the baseline for each resource

    Lastly, is utilized to make the sampling decisions on the node itself, without having to send or save any redundant traces.

    circle-info

    Certain aspects of our sampling algorithm are configurable - read more .

    hashtag
    3rd-party traces

    Various mechanisms control the sampling performed over 3rd party traces. Read more here:

    circle-exclamation

    When integrating 3rd-party traces, it is often wise to configure some sampling mechanism according to the specific use case.

    hashtag
    Additional Context

    Each trace is enriched with additional information to give as much context as possible for the service which generated the trace. This includes:

    • Container information - image, environment variables, pod name

    • Logs generated by the service around the time of the trace

    • of the resource around the time of the trace

    hashtag
    Distributed Tracing

    One of the advantages of ingesting 3rd-party traces is the ability to leverage their distributed tracing feature. groundcover natively displays the full trace for ingested traces in the Traces page.

    hashtag
    Trace Attributes

    Trace Attributes enable advanced filtering and search capabilities. groundcover support attributes across all trace types. This encompasses a diverse range of protocols such as HTTP, MongoDB, PostgreSQL, and others, as well as varied sources including eBPF or manual instrumentations (for example - OpenTelemetry).

    groundcover enriches your original traces and generates meaningful metadata as key-value pairs. This metadata includes critical information, such as protocol type, http.path, db.statement, and similar attributes, aligning with OTel conventions. Furthermore, groundcover seamlessly incorporates this metadata from spans received through supported manual instrumentations. For an in-depth understanding of attributes in OTel, please refer to (external link to OpenTelemtry website).

    Each attribute can be effortlessly integrated into your filters and search queries. You can add them directly from the trace side-panel with a simple click or input them manually into the search bar.

    Example: If you want to filter all HTTP traces that contain the path "/products". The query would be formatted as: @http.path:"/products". For a comprehensive guide on the query syntax, see Syntax table below.

    hashtag
    Trace Tags

    Trace Tags enable advanced filtering and search capabilities. groundcover support tags across all trace types. This encompasses a diverse range of protocols such as HTTP, MongoDB, PostgreSQL, and others, as well as varied sources including eBPF or manual instrumentations (for example - OpenTelemetry).

    Tags are powerful metadata components, structured as key-value pairs. They offer insightful information about the resource generating the span, like: container.image.name ,host.name and more.

    Tags include metadata enriched by the our sensor and additional metadata if provided by manual instrumentations (such as OpenTelemetry traces) . Utilizing these Tags enhances understanding and context of your traces, allowing for more comprehensive analysis and easier filtering by the relevant information.

    Each tag can be effortlessly integrated into your filters and search queries. You can add them directly from the trace side-panel with a simple click or input them manually into the search bar.

    Example: If you want to filter all traces from mysql containers - The query would be formatted as: container.image.name:mysql. For a comprehensive guide on the query syntax, see Syntax table below.

    hashtag
    Search and filter

    The Trace Explorer integrates dynamic filters and a versatile search functionality, to enhance your trace data analysis. You can filter out traces using specific criteria, including trace-status, workload, namespace and more, as well as limit your search to a specific time range.

    hashtag
    Traces Pipelines

    groundcover natively supports setting up log pipelines using This allow for full flexibility in the processing and manipulation of traces being collected - parsing additional patterns by regex, renaming attributes, and many more.

    hashtag
    Controlling retention

    groundcover allows full control over the retention of your traces. to learn more.

    hashtag
    Custom Configuration

    Tracing can be customized in several ways:

    Kubernetes requirements

    hashtag
    Kubernetes version

    groundcover supports any K8s version from v1.21.

    circle-info

    groundcover may work on many other K8s flavors, but we might just didn't get a chance to test it yet. Can't find yours in the list?

    hashtag
    Kubernetes distributions

    K8s distribution
    Status
    Comments

    hashtag
    Kubernetes RBAC permissions

    For the installation to complete successfully, permissions to deploy the following objects are required:

    • StatefulSet

    • Deployment

    • DaemonSet (With privileged containers for loading our )

    To learn more about groundcover's architecture and components visit our

    hashtag
    Outgoing traffic

    groundcover's portal pod sends HTTP requests to the cloud platform app.groundcover.com on port 443.

    This unique keeps the the data inside the cluster and fetches it on-demand keeping the data encrypted all the way without the need to open the cluster for incoming traffic via ingresses.

    Slack App for Channel Routing

    groundcover supports sending notifications to Slack using a Slack App with bot tokens instead of static webhooks. This method allows dynamic routing of alerts to any channel by including the channel ID in the payload. In addition to routing, messages can be enriched with formatting, blocks, and mentions — for example including <@user_id> in the payload to directly notify specific team members. This provides a flexible and powerful alternative to fixed incoming webhooks for alerting.

    Make sure you created a webhook for a Slack App with Bot Tokensarrow-up-right.

    Use the following workflow as an example. You can later enrich your workflow with additional functionality.

    Here are a few tips for using the example workflow:

    1. In the consts section, the channels attribute defines the mapping between Slack channels and their IDs. Use a clear, readable label to identify each channel (for example, the channel’s actual name in Slack), and map it to the corresponding channel ID.

    2. To locate a channel ID, open the channel in Slack, click the channel name at the top, and scroll to the About section. The channel ID is shown at the bottom of this section.

    1. The channel name should be included in the monitor’s Metadata Labels, or you can fall back to a default. See the channel_id attribute in the workflow example.\

    2. Finally, replace the integration name in {{ providers.slack-routing-webhook }} with the actual name of the Webhook integration you created.

    MS Teams

    To integrate groundcover with MS Teams, follow the steps below. Note that you’ll need at least a Business subscription of MS Teams to be able to create workflows.

    1. Create a webhook workflow for your dedicated Teams channel Go to Relevant Team -> Specific Channel -> "Workflows", and create a webhook workflow

    2. The webhook workflow is associated a URL which is used to trigger the MS Teams integration on groundcover - make sure to copy this URL

    3. Set Up the Webhook in groundcover

      • Head out to the integrations section: Settings -> Integrations, to create a new

      • Start by giving your Webhook integration a name. This name will be used below in the provider block sample .

      • Set the Webhook URL

    4. Create a Workflow Go to Monitors --> Workflows --> Create Workflow, and paste the YAML configuration provided below.

    5. Configure the provider Blocks (There are two of them) In the provider block, replace {{ providers.your-teams-integration-name }} with your actual Webhook integration name (the one you created in step 3) For example, if you named your integration test-ms-teams, the config reference would be: {{ providers.test-ms-teams }}

    circle-info

    The following example shows a pre-configured MS Teams workflow template. You can easily modify workflows to support different formats based on the MS Teams workflow schema.

    Sample code for your groundcover workflow:

    Search & Filter

    hashtag
    Search and filter

    To help you slice and dice your data, you can use our dynamic filters (left panel) and/or our powerful filter bar, which supports key:value pairs, as well as free text search. The Query Builder works in tandem with our filters.

    To further focus your results, you can also restrict the results to specific time windows using the time picker on the upper right of the screen.

    hashtag
    Query Builder

    The Query Builder is the default search option wherever search is available. Supporting advanced autocomplete of keys, values, and our discovery mode that across values in your data to teach users the data model.

    The following syntaxes are available for you to use in Query Builder:

    Syntax
    Description
    Examples
    Sections

    hashtag
    How to use filters

    Filters are very easy to add and remove, using the filters menu on the left bar. You can combine filters with the Query Builder, and filters applied using the left menu will also be added to the Query Builder in text format.

    • Select / deselect a single filter - click on the checkbox on the left of the filter. (You can also deselect a filter by clicking the 'x' next to the text format of the filter on the search bar).

    • Deselect all but one filter (within a filter category, such as 'Level' or 'Format') - hover over the filter you want to leave on, then click on "ONLY".

      • You can switch between filters you want to leave on by hovering on another filter and clicking "ONLY" again.

    incident.io

    To integrate groundcover with incident.ioarrow-up-right, follow the steps below. Note that you’ll need a Pro incident.io account to view your incoming alerts.

    1. Generate an Alerts configuration for groundcover Log in to your incident.ioarrow-up-right account. Go to "On-call" -> "Alerts" -> "Configure" and add a new source.

    2. On the "Create Alert Source" screen the answer to the question "Where do your alerts come from?" should be "HTTP". Select this source and give it a unique name. Hit "continue".

    3. incident.io will create your configuration now from which you will need to copy the following items for the Webhook integration\

    4. Set Up the Webhook in groundcover

      • Head out to the integrations section: Settings -> Integrations, to create a new

      • Start by giving your Webhook integration a name. This name will be used below in the provider block sample .

    5. Create a Workflow Go to Monitors --> Workflows --> Create Workflow, and paste the YAML configuration provided below. Note: The body section is a dictionary of keys that will be sent as a JSON payload to the incident.io platform

    6. Configure the provider Block In the provider block, replace {{ providers.your-incident-io-integration-name }} with your actual Webhook integration name (the one you created in step 4) For example, if you named your integration test-incidentio, the config reference would be: {{ providers.test-incidentio }}

    7. Required Parameters for Creating an alert When triggering an alert, the following keys are required:

      1. title - Alert title that can be pulled from groundcover as seen in the example

      2. status - One of "firing" or "resolved" that can also be pulled from groundcover as the example shows.

    8. You can include additional parameters for richer context (optional):

      1. description

      2. deduplication_key - unique attribute used to group identical alerts, groundcover provides this through the fingerprint attribute

    circle-info

    The attributes shown in the yaml block of the metadata section below are an example only! Alert labels can only be attributes used in the group by section of the actual monitor

    Example code for your groundcover workflow:

    Log Patterns

    Log Patterns help you cut through log noise by grouping similar logs based on structure. Instead of digging through thousands of raw lines, you get a clean, high-level view of what’s actually going on

    hashtag
    Overview

    Log Patterns in groundcover help you make sense of massive log volumes by grouping logs with similar structure. Instead of showing every log line, the platform automatically extracts the static skeleton and replace dynamic values like timestamps, user IDs, or error codes with smart tokens.

    This lets you:

    • Cut through the noise

    • Spot recurring behaviors

    • Investigate anomalies faster

    hashtag
    How It Works

    groundcover automatically detects variable parts of each log line and replace them with placeholders to surface the repeating structure.

    Placeholder
    Description
    Example

    hashtag
    Requirements

    Log Patterns are collected directly on the sensor.

    hashtag
    Example

    Raw log:

    Patterned:

    hashtag
    Viewing Patterns

    1. Go to the Logs section.

    2. Switch from Records to Patterns using the toggle at the top.

    3. Patterns are grouped and sorted by frequency. You’ll see:

    hashtag
    Value Distribution

    You can hover over any tag in a pattern to preview the distribution of values for that specific token. This feature provides a breakdown of sample values and their approximate frequency, based on sampled log data.

    This is especially useful when investigating common IPs, error codes, user identifiers, or other dynamic fields, helping you understand which values dominate or stand out without drilling into individual logs.

    For example, hovering over an <IP4> token will show a tooltip listing the most common IP addresses and their respective counts and percentages.

    hashtag
    Investigating Patterns

    • Click a pattern: Filters the Logs view to only show matching entries.

    • Use filters: Narrow things down by workload, level, format, or custom fields.

    • Suppress patterns: Hide noisy templates like health checks to stay focused on what matters.

    Delete Ingestion Key

    Delete an existing ingestion key. This operation permanently removes the key and cannot be undone.

    hashtag
    Endpoint

    DELETE /api/rbac/ingestion-keys/delete

    hashtag
    Authentication

    This endpoint requires API Key authentication via the Authorization header.

    hashtag
    Headers

    hashtag
    Request Body

    Parameter
    Type
    Required
    Description

    hashtag
    Examples

    hashtag
    Delete by Name

    hashtag
    Response

    The endpoint returns an empty response body with HTTP 200 status code when the key is deleted successfully.

    hashtag
    Important Warnings

    🚨 PERMANENT DELETION: This operation permanently deletes the ingestion key and cannot be undone.

    ⚠️ Immediate Impact: Any services using this key will:

    • Receive 403 PERMISSION_DENIED errors

    • Stop sending data to groundcover immediately

    • Lose access to remote configuration (for sensor keys)

    hashtag
    Verification

    To verify the key was deleted, use the List Ingestion Keys endpoint:

    This should return an empty array [] if the key was successfully deleted.

    hashtag
    Safe Deletion Workflow

    1. Identify the key to delete using the List endpoint

    2. Update integrations to use a different key first

    3. Test integrations work with the new key

    hashtag
    Best Practices

    • Always have a replacement key ready before deleting production keys

    • Test your rollover plan in staging environments first

    • Update all services using the key before deletion

    hashtag
    Related Documentation

    For comprehensive information about ingestion keys management, see:

    Issues

    Quickly understand what requires your attention and drive your investigations

    The issues page is a useful place to start a troubleshooting or investigation flow from. It gathers together all active issues found in your Kubernetes environment.

    hashtag
    Issue Types

    • HTTP / gRPC Failures Capturing failed HTTP calls with Response Status Codes of:

      • 5XX — Internal Server Error

      • 429 — Too Many Requests

    • MySQL / PostgreSQL Failures

      Capturing failed SQL statement executions with Response Errors Codes such as:

      • 1146 — No Such Table

    • Redis Failures Capturing any reported Error by the Redis serialization protocol (RESP), such as:

      • ERR unknown command\

    • Container Restarts Capturing all container restart events across the cluster, with Exit Codes such as:

      • 0 — Completed

    • Deployment Failures

      Capturing events such as:

      • MinimumReplicasUnavailable — Deployment does not have minimum availabiltiy

    hashtag
    Issue Aggregation

    Issues are auto-detected and aggregated - representing many identical repeating incidents. Aggregation help cutting through the noise quickly and reach insights like when a new type of issue started to appear, and when it was last seen.

    Issues are grouped by:

    • Type (HTTP, gRPC, Container Restart, etc..)

    • Status Code / Error Code (e.g HTTP 500, gRPC 13)

    • Workload name

    The smart aggregation mechanism will also identify query parameters, remove them, and group the stripped queries / API URIs into patterns. This allows users to easily identify and isolate the root cause of a problem.

    hashtag
    Troubleshooting with Issues

    Each issue is assigned a velocity graph showing it's behavior over time (like when it was first seen) and a live counter of its number of incidents.

    By clicking on an issue, users can access the specific traces captured around the relevant issue. Each trace is related to the exact resource that was used (e.g. raw API URI, or SQL query), it's latency and Status Code / Error Code.

    Further clicking on a selected captured trace allows the user to investigate the root cause of the issue with the entire payload (body and headers) of the request and response, the information around the participating container, the application logs around incident's time and the full context of the metrics around the incident.

    Migrating from Issues to Monitors Issues Page

    The legacy Issues page is being deprecated in favor of a fully customizable, monitor-based experience that gives you more control over what constitutes an issue in your environment.

    While the new page introduces powerful capabilities, no core functionality is being removed, the key change is that the old auto-created issue rules will no longer be automatically generated. Instead, you’ll define your own monitors, or choose from a rich catalog of prebuilt ones.

    All the existing issues in the legacy page can be easily added to the monitors via the Monitors Catalog's "Started Pack". See Getting Started below for more info.

    hashtag
    Why migrate to the new Issues experience?

    The new Issues page is built on top of the Monitors engine, enabling deeper customization and automation:

    1. Define what qualifies as an issue

      Use filters in monitor definitions to include or exclude workloads, namespaces, HTTP status codes, clusters, and more tailor it to your context.

    2. Silence issues with precision

      Silence issues based on any label, such as status_code, cluster, or workload

    hashtag
    What’s Changing?

    Aspect
    Legacy Issues Page
    New Issues Page

    hashtag
    Getting Started

    All the built-in rules you’re used to are already available in the , you can add them all with a single click.

    Adding the monitors in the "Started Pack" will match all the existing issues in the legacy page.

    Head to:

    Monitors → Create Monitor -> Monitor Catalog → Recommended Monitors

    circle-info

    Only users with Editor/Admin roles can create monitors

    hashtag
    Learn More

    Using groundcover as Prometheus/Clickhouse database in a Self-hosted Grafana

    Exposing Data Sources for BYOC installations

    groundcover BYOC exposes Prometheusarrow-up-right data sources for programmatic access via API, and integration with customer owned Grafana instances.

    circle-exclamation

    Different steps are required for On-Prem deployments, contact us for additional info.

    hashtag
    Requirements

    hashtag
    API Key

    The groundcover tenant API KEY is required for configuring the data source connection.

    You can obtain your API key from the in the groundcover console.

    For this example we will use the key API-KEY-VALUE

    hashtag
    Setup

    hashtag
    Prometheus

    hashtag
    Grafana Data Source Configuration

    Configure Grafana prometheus Data Source by following these steps logged in as Grafana Admin.

    1. Connections > Data Sources > + Add new data source

    2. Pick Prometheus

      1. Name: groundcover-prometheus

    circle-info

    "Successfully queried the Prometheus API" means the integration was configured correctly.

    hashtag
    Legacy Configuration

    The following configurations are deprecated but may still be in use in older setups.

    hashtag
    Datasources API Key

    circle-exclamation

    The legacy datasources API key can be obtained by running: groundcover auth get-datasources-api-key

    hashtag
    ClickHouse

    circle-exclamation

    ClickHouse datasource integration is deprecated and no longer supported for new installations.

    Configure Grafana ClickHouse Data Source by following these steps logged in as Grafana Admin.

    1. Connections > Data Sources > + Add new data source

    2. Pick ClickHouse

      1. Name: groundcover-clickhouse

    circle-info

    "Data source is working" means the integration was configured correctly.

    Alert Structure

    Fields description in the alert you can use in your workflows

    hashtag
    Structure

    Field Name
    Description
    Example

    labels

    Map of key:values derived from monitor definition.

    { "workload": "frontend",

    "namespace": "prod" }

    hashtag
    Usage

    When crafting your workflows you can use any of the fields above using templating in any workflow field. Encapsulate you fields using double opening and closing curly brackets.

    hashtag
    Examples

    hashtag
    Using Label Values

    You can access label values by alert.labels.*

    Migrate from Datadog

    Complete guide for migrating your Datadog setup to groundcover.

    hashtag
    Prerequisites

    Access required:

    • Admin role in groundcover

    Connect RUM

    circle-info

    This capability is only available to organizations subscribed to our .

    groundcover’s Real User Monitoring (RUM) SDK allows you to capture front-end performance data, user events, and errors from your web applications.

    Start capturing RUM data by installing the in your web app.

    This guide will walk you through installing the SDK, initializing it, identifying users, sending custom events, capturing exceptions, and configuring optional settings.

    Configure groundcover's MCP Server

    Set up your agent to talk to groundcover’s MCP server. Use OAuth for a quick login, or an API key for service accounts.

    The MCP server supports two methods:

    • (Recommended for IDEs)

    Kernel requirements for eBPF sensor

    hashtag
    Intro

    groundcover’s eBPF sensor uses state-of-the-art kernel features to provide full coverage at low overhead. In order to do so it requires certain kernel features which are listed below.

    circle-info

    Managing Dashboards with Terraform

    hashtag
    Create & manage dashboards with Terraform

    Use Terraform to create, update, delete, and list groundcover dashboards as code. Managing dashboards with infrastructure‑as‑code (IaC) lets you version changes, review them in pull requests, promote the same definitions across environments, and detect drift between what’s applied and what’s running in your account.


    Ingestion Keys

    Secure, write‑focused credentials for streaming data into groundcover

    Ingestion Keys let sensors, integrations and browsers send observability data to your groundcover backend. They are the counterpart of API Keys, which are optimized for reading data or automating dashboards and monitors.


    hashtag
    Key types

    Connect Linux hosts

    Linux hosts sensor

    circle-exclamation

    Note: Linux host sensor is only available to BYOC and on-prem deployments. Check out our for more information about subscription plans and the available deployment modes.

    hashtag
    Supported Environments

    Create Workflow

    Creates a new workflow for alert handling and notifications. Workflows define how alerts are processed and routed to various integrations like Slack, PagerDuty, webhooks, etc.

    hashtag
    Endpoint

    POST /api/workflows/create

    hashtag

    List Namespaces

    Retrieve a list of Kubernetes namespaces within a specified time range.

    hashtag
    Endpoint

    hashtag
    Authentication

    This endpoint requires API Key authentication via the Authorization header.

    Use groundcover to get 5 logs from the workload news-service from the past 15 minutes.
    Use groundcover to get the spec of the chat-app deployment.
    Use groundcover to show the top 5 workloads by P95 latency.
    I got an alert for this critical groundcover issue. Can you investigate it?
    https://app.groundcover.com/monitors/issues?...
    I got multiple alerts in the staging-env namespace. Can you help me look into them using groundcover?
    Use groundcover to debug this code. For each test, print relevant logs with test_id, and dive into any error logs.
    Please deploy this service and verify everything works using groundcover.
    mkdir groundcover-tf-example && cd groundcover-tf-example
    resource "grafana_folder" "goldensignals" {
      title = "Golden Signals"
    }
    
    resource "grafana_dashboard" "workloadgoldensignals" {
      config_json = file("workloadgoldensignals.json")
      folder = grafana_folder.goldensignals.id
    }

    Prometheus server URL: https://app.groundcover.com/api/prometheus

  • Customer HTTP Headers > Add Header

    1. Header: authorization

    2. Value: Bearer API-KEY-VALUE

  • Performance

    1. Prometheus type: Prometheus

    2. Prometheus version: > 2.50.x

  • Click "Save & test"

  • Server

    1. Server address: ds.groundcover.com

    2. Server port: 443

    3. Protocol: HTTP

    4. Secure Connection: ON

  • HTTP Headers

    1. Forward Grafana HTTP Headers: ON

  • Credentials

    1. Username: Leave empty

    2. Password: API-KEY-VALUE

  • Additional Properties

    1. Default database: groundcover

  • Click "Save & test"

  • API Keys page

    1040 — Too Many Connections

  • 1064 — Syntax Error\

  • 137
    — OOMKilled\

    Namespace

    All headers

  • All query parameters

  • All bodies - for both the request and response

  • Kubernetes events relevant to the service

  • CPU and Memory utilization of the service and the node it is scheduled on

  • supported
    datasources page
    stream processing
    here
    OpenTelemetry
    Datadog
    Golden Signals
    OTel Attributes Documentationarrow-up-right
    Learn more about how to use our search syntaxes
    Vector transforms.arrow-up-right
    Learn more about how to configure traces pipelines
    Read here
    Configuring which protocols should be traced
    Configuring obfuscation for sensitive payload data
    Configuring the sampling mechanism
    Infrastructure Metrics

    OpenShift

    supported

    Rancher

    supported

    Self-managed

    supported

    minikube

    supported

    kind

    supported

    Rancher Desktop

    supported

    k0s

    supported

    k3s

    supported

    k3d

    supported

    microk8s

    supported

    AWS Fargate

    not supported

    Docker-desktop

    not supported

    ConfigMap
  • Secret

  • PVC

  • EKS

    supported

    AKS

    supported

    GKE

    supported

    OKE

    supported

    let us know over Slack.arrow-up-right
    eBPF sensor
    Architecture Section
    architecture

    -key:value

    Exclude: Specify terms or filters to omit from your search; applies to each distinct search.

    -key:value -term -"search term"

    Logs Traces K8s Events API Catalog Issues

    *:value

    Search all attributes:

    Search any attribute for a value, you can use double quotes for exact match and wildcards.

    *:error *:"POST /api/search" *:erro*

    Logs Traces Issues

    To turn all other filters in that filter category back on, hover over the filter again and click "ALL".

  • Clear all filters within a filters category - click on the funnel icon next to the category name.

  • Clear all filters currently applied - click on the funnel icon next to the number of results.

  • key:value

    Search attributes:

    Both groundcover built-ins custom attributes.

    Use * for wildcard search. Note: Multiple filters for the same key act as 'OR' conditions, whereas multiple filters for different keys act as 'AND' conditions.

    namespace:prod-us namespace:prod-*

    Logs Traces K8s Events API Catalog Issues

    term

    Free text: Search for single-word terms. Tip: Expand your search results by using wildcards.

    Exception DivisionBy*

    Logs

    "term"

    Phrase Search (case-insensitive): Enclose terms within double quotes to find results containing the exact phrase. Note: Using double quotes does not work with * wildcards.

    "search term"

    Logs

    , to reduce noise and keep focus.
  • Clean, scoped issue view

    Only see issues relevant to your environment, based on your configured monitors and silencing rules, no clutter.

  • Get alerted on new issues

    Trigger alerts through your preferred integrations (Slack, PagerDuty, Webhooks, etc.) when a new issue is detected.

  • Define custom issues using all your data

    Build monitors using metrics, traces, logs, and events, and correlate them to uncover complex problems.

  • Manage everything as code

    Use Terraform to manage monitors and issues at scale, ensuring consistency and auditability.

  • Terraform Support

    ❌

    ✅

    Issues Based on Traces/Logs/Metrics

    Limited

    Full support

    Issue Source

    Auto-created rules

    User-defined monitors

    Custom Filtering

    ❌

    ✅

    Silencing by Labels

    ❌

    ✅

    Alerts via Integrations

    ❌

    Monitors Catalog
    Create a new Monitor
    Issues page
    Silences page

    ✅

    to the url you copied from field (2)
  • Keep the HTTP method as POST

  • Webhookarrow-up-right
    Set the Webhook URL to the url you copied from field (1)
  • Keep the HTTP method as POST

  • Under headers add Authorization, and paste the "Bearer <token>" copied from field (2).

  • metadata - Any additional metadata that you've configured within your monitor in groundcover. Note that these set should actively reflect your monitor definition in groundcover

    Webhookarrow-up-right

    <V>

    Version

    v0.32.0

    <TM>

    Time measure

    5.5ms

    Log level (Error, Info, etc.)

  • Count and percentage of total logs

  • Pattern’s trend over time

  • Workload origin

  • The structured pattern itself

  • Export patterns: Use the three-dot menu to copy the pattern for further analysis or alert creation.

    <TS>

    Timestamp

    2025-03-31T17:00:00Z

    <N>

    Number

    404, 123

    <IP4>

    IPv4 Address

    192.168.0.1

    <*>

    Wildcard (text, path, etc.)

    /api/v1/users/42

    Delete the old key using this endpoint
  • Verify deletion using the List endpoint

  • Use descriptive names to avoid accidental deletion of wrong keys
  • Consider key rotation instead of permanent deletion for security incidents

  • name

    string

    Yes

    Exact name of the ingestion key to delete

    Ingestion Keysarrow-up-right

    status

    Current status of the alert

    • firing - Active alert indicating an ongoing issue.

    • resolved - The issue has been resolved, and the alert is no longer active.

    • suppressed - Alert is suppressed.

    • pending - No Data or insufficient data to determine the alert state.

    lastReceived

    Timestamp when the alert was last received

    This alert timestamp

    firingStartTime

    Start time of the firing alert

    First timestamp of the current firing state.

    source

    Sources generating the alert

    grafana

    fingerprint

    Unique fingerprint of the alert, this is a hash of the labels

    02f5568d4c4b5b7f

    alertname

    Name of the monitor

    Workload Pods Crashed Monitor

    _gc_severity

    The defined severity of the alert

    S3, error

    trigger

    Trigger condition of the workflow

    alert / manual / interval

    values

    A map containing two values that can be used:

    The numeric value that triggered the alert (threshold_input_query) and the actual threshold that was defined for the alert (threshold_1)

    "values": { "threshold_1": 0, "threshold_input_query": 99.507}

    hashtag
    Install the SDK

    hashtag
    Initialize the SDK

    hashtag
    Initialization

    hashtag
    Configuration Parameters

    apiKey

    A dedicated Ingestion Key of type RUM (Settings -> Access -> Ingestion Keys)

    dsn

    Your public groundcover endpoint in the format of https://example.platform.grcv.io , where example.platform.grcv.io is your ingress.site installation value.

    cluster

    Identifier for your cluster; helps filter RUM data by specific cluster.

    environment

    Environment label (e.g., production, staging) used for filtering data.

    appId

    Custom application identifier set by you; useful for filtering and segmenting data on a single application level later.

    hashtag
    Advanced Configuration

    You can customize SDK behavior (event sampling, data masking, enabled events). The following properties are customizable:

    You can pass the values by calling the init function:

    Or via the updateConfig function:

    hashtag
    Identify Users

    Link RUM data to specific users:

    hashtag
    Send Custom Events

    Instrument key user interactions:

    hashtag
    Capture Exceptions

    Manually track caught errors:

    Enterprise planarrow-up-right
    browser SDKarrow-up-right
    Authentication

    This endpoint requires API key authentication.

    hashtag
    Headers

    Header
    Value
    Description

    Authorization

    Bearer <YOUR_API_KEY>

    Your groundcover API key

    Content-Type

    text/plain

    The request body should be raw YAML

    hashtag
    Request Body

    The request body should contain raw YAML defining the workflow configuration. The YAML structure should include:

    • id: Unique identifier for the workflow

    • description: Human-readable description

    • triggers: Array of trigger conditions

    • actions: Array of actions to perform when triggered

    • name: Display name for the workflow

    • consts (optional): Constants and helper variables

    hashtag
    Example Request

    hashtag
    Response

    hashtag
    Workflow YAML Structure

    hashtag
    Basic Structure

    hashtag
    Choosing Integration Providers

    To route alerts to a specific integration (Slack, PagerDuty, webhook, etc.), use the config field in the provider section to reference your configured integration by name.

    hashtag
    Example: Slack Integration

    hashtag
    Provider Configuration

    • config: '{{ providers.integration-name }}' - References a specific integration you've configured in groundcover

    • type - Specifies the integration type (slack, webhook, pagerduty, opsgenie)

    • Replace integration-name with your actual integration name.

    The integration name must match the name of an integration you've previously configured in your groundcover workspace.

    hashtag
    References

    For workflow examples and advanced configurations, see the groundcover workflow examples documentationarrow-up-right.

    workflow:
      id: slack-channel-routing-workflow
      description: workflow for all channels with dynamic routing
      triggers:
      - type: alert
        filters:
        - key: annotations.slack-channel-routing-workflow
          value: enabled
      name: slack-channel-routing-workflow
      consts:
        channels: '{"devops":"C0111111111", "alerts":"C0222222222", "incidents":"C0333333333"}'
        channel_id: keep.dictget( '{{ consts.channels }}', '{{ alert.labels.channel_id }}', 'C09G9AFHLTB')
        env: keep.dictget({{ alert.labels }}, 'env', 'no-env')
        upper_env: "keep.uppercase({{consts.env}})"
        severity: keep.dictget({{ alert.annotations }}, '_gc_severity', 'unknown-severity')
        summary: keep.dictget({{ alert.labels }}, 'summary', 'no-summary')
        slack_message: "<https://app.groundcover.com/monitors/create-silence?keep.replace(keep.join(keep.dict_pop({{ alert.labels }}, \"_gc_monitor_id\", \"_gc_monitor_name\", \"_gc_severity\", \"backend_id\", \"grafana_folder\", \"_gc_issue_header\"), \"&\", \"matcher_\"), \" \", \"+\")|Silence> :no_bell: | \n<https://app.groundcover.com/monitors/issues?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.fingerprint }}|Investigate> :mag: | \n<https://app.groundcover.com/monitors?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.labels._gc_monitor_id }}|See Monitor> :chart_with_upwards_trend:\n\n*Labels:*  \n- keep.join(keep.dict_pop({{alert.labels}}, \"_gc_monitor_id\", \"_gc_monitor_name\", \"_gc_severity\", \"backend_id\", \"grafana_folder\", \"_gc_issue_header\"), \"\\n- \")\n"
        title_link: "https://app.groundcover.com/monitors/issues?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.fingerprint }}"
        red_color: "#FF0000"
        green_color: "#008000"
        footer_url: "groundcover.com"
        footer_icon: "https://app.groundcover.com/favicon.ico"
      actions:
      - if: "{{ alert.status }} == 'firing'"
        name: webhook-alert
        provider:
          type: webhook
          config: "{{ providers.slack-routing-webhook }}"
          with:
            body:
              channel: "{{ consts.channel_id }}"
              attachments:
              - color: "{{ consts.red_color }}"
                footer: "{{ consts.footer_url }}"
                footer_icon: "{{ consts.footer_icon }}"
                text: "{{ consts.slack_message }}"
                title: "\U0001F6A8 Firing: {{ alert.alertname }} [{{ consts.upper_env}}]"
                title_link: "{{ consts.title_link }}"
                type: plain_text
      - if: "{{ alert.status }} != 'firing'"
        name: webhook-alert-resolved
        provider:
          type: webhook
          config: "{{ providers.slack-routing-webhook }}"
          with:
            body:
              channel: "{{ consts.channel_id }}"
              text: "\u2705 [RESOLVED][{{ consts.upper_env}}] {{ consts.severity }} {{ alert.alertname }}"
              attachments:
              - color: "{{ consts.green_color }}"
                text: "*Summary:* {{ consts.summary }}"
                fields:
                - title: "Environment"
                  value: "{{ consts.upper_env}}"
                  short: true
                footer: "{{ consts.footer_url }}"
                footer_icon: "{{ consts.footer_icon }}"
    workflow:
      id: ms-teams-alerts-workflow
      description: Sends an API to MS Teams alerts endpoint
      name: ms-teams-alerts-workflow
      triggers:
      - type: alert
        filters:
        - key: annotations.ms-teams-alerts-workflow
          value: enabled
      consts:
        silence_link: 'https://app.groundcover.com/monitors/create-silence?keep.replace(keep.join(keep.dict_pop({{ alert.labels }}, "_gc_monitor_id", "_gc_monitor_name", "_gc_severity", "backend_id", "grafana_folder"), "&", "matcher_"), " ", "+")'
        monitor_link: 'https://app.groundcover.com/monitors?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.labels._gc_monitor_id }}'
        title_link: 'https://app.groundcover.com/monitors/issues?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.fingerprint }}'
        description: keep.dictget( {{ alert.annotations }}, "_gc_description", '')
        redacted_labels: keep.join(keep.dict_pop({{alert.labels}}, "_gc_monitor_id", "_gc_monitor_name", "_gc_severity", "backend_id", "grafana_folder", "_gc_issue_header"), "-\n")
        title: keep.dictget( {{ alert.annotations }}, "_gc_issue_header", "{{ alert.alertname }}")
    
      actions:
      - if: '{{ alert.status }} == "firing"'
        name: teams-webhook-firing
        provider:
          config: ' {{ providers.your-teams-integration-name }} '
          type: webhook
          with:
            body:
              type: message
              attachments:
              - contentType: application/vnd.microsoft.card.adaptive
                content:
                  $schema: http://adaptivecards.io/schemas/adaptive-card.json
                  type: AdaptiveCard
                  version: "1.2"
                  body:
                  - type: TextBlock
                    text: "\U0001F6A8 Firing: {{ consts.title }}"
                    weight: bolder
                    size: large
                  - type: TextBlock
                    text: "[Investigate Issue]({{consts.title_link}})"
                    wrap: true
                  - type: TextBlock
                    text: "{{ consts.description }}"
                    wrap: true
                  - type: TextBlock
                    text: "[Silence]({{consts.silence_link}})"
                    wrap: true
                  - type: TextBlock
                    text: "[See monitor]({{consts.monitor_link}})"
                    wrap: true
                  - type: TextBlock
                    text: "{{ consts.redacted_labels }}"
                    wrap: true
      - if: '{{ alert.status }} != "firing"'
        name: teams-webhook-resolved
        provider:
          config: ' {{ providers.your-teams-integration-name }} '
          type: webhook
          with:
            body:
              type: message
              attachments:
              - contentType: application/vnd.microsoft.card.adaptive
                content:
                  $schema: http://adaptivecards.io/schemas/adaptive-card.json
                  type: AdaptiveCard
                  version: "1.2"
                  body:
                  - type: TextBlock
                    text: "\U0001F7E2 Resolved: {{ consts.title }}"
                    weight: bolder
                    size: large
                  - type: TextBlock
                    text: "[Investigate Issue]({{consts.title_link}})"
                    wrap: true
                  - type: TextBlock
                    text: "{{ consts.description }}"
                    wrap: true
                  - type: TextBlock
                    text: "[Silence]({{consts.silence_link}})"
                    wrap: true
                  - type: TextBlock
                    text: "[See monitor]({{consts.monitor_link}})"
                    wrap: true
                  - type: TextBlock
                    text: "{{ consts.redacted_labels }}"
                    wrap: true
      
    workflow:
      id: incident-io-alerts-workflow
      name: incident-io-alerts-workflow
      description: Sends an API to incident.io alerts endpoint
      triggers:
      - type: alert
        filters:
          - key: annotations.incident-io-alerts-workflow
            value: enabled      
      consts:
        description: keep.dictget( {{ alert.annotations }}, "_gc_description", '')
        issue: https://app.groundcover.com/monitors/issues?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.fingerprint }}
        monitor: https://app.groundcover.com/monitors?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.labels._gc_monitor_id }}
        redacted_labels: keep.dict_pop({{alert.labels}}, "_gc_monitor_id", "_gc_monitor_name", "_gc_severity", "backend_id", "grafana_folder", "_gc_issue_header")
        silence: https://app.groundcover.com/monitors/create-silence?keep.replace(keep.join({{ consts.redacted_labels }}, "&", "matcher_"), " ", "+")
        title: keep.dictget( {{ alert.annotations }}, "_gc_issue_header", "{{ alert.alertname }}")
        cluster: keep.dictget( {{ alert.labels }}, "cluster", "[no-cluster]")
        namespace: keep.dictget( {{ alert.labels }}, "namespace", "[no-namespace]")
        workload: keep.dictget( {{ alert.labels }}, "workload", "[no-workload]")
      actions:
      - name: webhook
        provider:
          config: ' {{ providers.your-incident-io-integration-name }} '
          type: webhook
          with:
            body:
              title: '{{ alert.alertname }}'
              description: '{{ alert.description }}'
              deduplication_key: '{{ alert.fingerprint }}'
              status: '{{ alert.status }}'
              # To use metadata attributes that refer to alert.labels, the attributes 
              # must be used in the group by section of the monitor - the example below
              # assumes that cluster, namespace and workload were used for group by
              metadata:
                cluster: '{{ consts.cluster }}'
                namespace: '{{ consts.namespace }}'
                service: '{{ consts.workload }}'
                severity: '{{ alert.annotations._gc_severity }}'
    192.168.0.1 - - [30/Mar/2025:12:00:01 +0000] "GET /api/v1/users/123 HTTP/1.1" 200
    <IP4> - - [<TS>] "<*> HTTP/<N>.<N>" <N>
    Authorization: Bearer <YOUR_API_KEY>
    Content-Type: application/json
    {
      "name": "string"
    }
    curl -L \
      --request DELETE \
      --url 'https://api.groundcover.com/api/rbac/ingestion-keys/delete' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --data '{
        "name": "old-test-key"
      }'
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/rbac/ingestion-keys/list' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --data '{
        "name": "old-test-key"
      }'
    message: "Pod Crashed - Pod: {{ alert.labels.pod_name }} Namespace: {{ alert.labels.namespace }}"
    npm install @groundcover/browser
    # or
    yarn add @groundcover/browser
    import groundcover from '@groundcover/browser';
    
    groundcover.init({
      apiKey: 'your-ingestion-key',
      cluster: 'your-cluster',
      environment: 'production',
      dsn: 'your-dsn',
      appId: 'your-app-id',
    });
    export interface SDKOptions {
      batchSize: number;
      batchTimeout: number;
      eventSampleRate: number;
      sessionSampleRate: number;
      environment: string;
      debug: boolean;
      tracePropagationUrls: string[];
      beforeSend: (event: Event) => boolean;
      enabledEvents: Array<"dom" | "network" | "exceptions" | "logs" | "pageload" | "navigation" | "performance">;
      excludedUrls: [];
    }
    groundcover.init({
      apiKey: 'your-ingestion-key',
      cluster: 'your-cluster',
      environment: 'production',
      dsn: 'your-dsn',
      appId: 'your-app-id',
      options: {
        batchSize: 50,
        sessionSampleRate: 0.5, // 50% sessions sampled
        eventsSampleRate: 0.5,
      },
    });
    groundcover.updateConfig({
       batchSize: 20,
    });
    groundcover.identifyUser({
      id: 'user-id',
      email: '[email protected]',
    });
    groundcover.sendCustomEvent({
      event: 'PurchaseCompleted',
      attributes: { orderId: 1234, amount: 99.99 },
    });
    try {
      performAction();
    } catch (error) {
      groundcover.captureException(error);
    }
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/workflows/create' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: text/plain' \
      --data 'id: example-workflow
    description: Example workflow for API documentation
    triggers:
    - type: alert
      filters:
      - key: annotations.example-workflow
        value: enabled
    name: example-workflow
    consts:
      severity: keep.dictget({{ alert.annotations }}, "_gc_severity", "info")
      title: keep.dictget({{ alert.annotations }}, "_gc_issue_header", "{{ alert.alertname }}")
    actions:
    - name: webhook-action
      provider:
        type: webhook
        config: "{{ providers.webhook-provider }}"
        with:
          body:
            alert_name: "{{ consts.title }}"
            severity: "{{ consts.severity }}"'
    {
      "workflow_id": "xxxxx-xxxx-xxxxx-xxxx-xxxxx",
      "status": "created",
      "revision": 1
    }
    id: workflow-id
    description: Workflow description
    triggers:
      - type: alert
        filters:
          - key: filter-key
            value: filter-value
    name: workflow-name
    consts:
      variable_name: value
    actions:
      - name: action-name
        provider:
          type: provider-type
          config: provider-config
          with:
            action-specific-parameters
    actions:
    - if: '{{ alert.status }} == "firing"'
      name: slack-action-firing
      provider:
        config: '{{ providers.integration-name }}'
        type: slack
        with:
          attachments:
          - color: '#FF0000'
            footer: 'groundcover.com'
            footer_icon: 'https://app.groundcover.com/favicon.ico'
            text: 'Alert details here'
            title: 'Firing: {{ alert.alertname }}'
            ts: keep.utcnowtimestamp()
            type: plain_text
          message: ' '

    Datadog API key and application key with read permissions

    Datadog application key scopes:

    • dashboards_readarrow-up-right - List and retrieve dashboards

    • monitors_readarrow-up-right - View monitors

    • metrics_readarrow-up-right - Query timeseries data

    • - View AWS, GCP, Azure integrations

    hashtag
    Create a migration project

    Navigate to Settings → Migrations.

    1. Click Start on the Datadog card

    2. Enter a project name (e.g., "Production Migration", "US5 Migration")

    3. Click Create

    Tip: Use descriptive names. You can run multiple migration projects for different environments or teams.

    hashtag
    Fetch assets from Datadog

    Provide your Datadog credentials:

    hashtag
    Datadog site

    The domain of your Datadog console. The options are:

    • US1 - app.datadoghq.com

    • US3 - us3.datadoghq.com

    • US5 - us5.datadoghq.com

    • EU1 - app.datadoghq.eu

    • AP1 - ap1.datadoghq.com

    You can find your Datadog site by looking at your console's URL.

    hashtag
    API key

    A regular Datadog API key. Find this under Organization Settings → API Keysarrow-up-right.

    hashtag
    Application key

    Create one under Organization Settings → Application Keysarrow-up-right with the required scopes listed above.

    circle-info

    Important: groundcover does not store these keys. Assets are fetched, then the keys are discarded.

    Click Fetch Assets. This typically takes 10 seconds depending on the number of assets.

    hashtag
    Review migration summary

    Once fetched, you see:

    • Progress overview: Total assets discovered and their support status

    • Asset cards: Monitors, Dashboards, Data Sources, etc

    • Support breakdown: How many assets are fully supported, partial, or unsupported

    The overview shows everything we found in your Datadog account and what we'll bring over.

    hashtag
    Migrate data sources

    Before migrating monitors and dashboards, we set up your data sources.

    hashtag
    What we detect

    groundcover automatically discovers:

    • AWS integrations: CloudWatch metrics, account configurations

    • GCP integrations: Cloud Monitoring metrics, project setups

    • Azure integrations: Azure Monitor metrics, subscription details

    Tip: Migrate all data sources first. This prevents missing data issues when monitors go live.

    hashtag
    Migrate monitors

    Once data sources are ready, migrate your monitors.

    hashtag
    Monitor status indicators

    • ✓ Supported: Fully compatible. Migrate as-is.

    • ⚠ Partial: Migrates with warnings. Review before installing.

    • ✗ Unsupported: Requires manual attention.

    hashtag
    Review warnings

    For monitors with warnings, click View Warnings:

    • See what adjustments were made

    • Understand query translations

    • Get recommendations for post-migration verification

    Warnings don't block migration — they inform you of changes so you can verify behavior.

    hashtag
    Migrate monitors

    Single monitor:

    1. Preview the monitor

    2. Click Migrate

    3. Monitor installs immediately

    Bulk migrate:

    1. Select multiple monitors using checkboxes

    2. Click Migrate Selected

    3. All install in parallel

    Migrated monitors appear instantly in Monitors → Monitor List.

    hashtag
    Migrate dashboards

    Dashboards preserve:

    • Layout and widget positions

    • Query logic and filters

    • Time ranges and visualization settings

    • Colors and formatting

    Check out the dashboard preview to confirm the migration worked and that all your assets came through successfully.

    hashtag
    Migrate dashboards

    Click Migrate to install. Dashboards appear under Dashboards immediately.

    Tip: Migrate critical dashboards first. Verify queries return expected data before bulk migrating.

    hashtag
    OAuth (recommended)

    OAuth is the default if your agent supports it.

    To get started, add the config below to your MCP client. Once you run it, your browser will open and prompt you to log in with your groundcover credentials.

    circle-info

    🧘 Pro tip: You can copy a ready-to-go command from the UI. Go to the sidebar → Click your profile picture → "Connect to our MCP"

    The tenant's UUID and your time zone will be auto-filled.

    circle-exclamation

    Please make sure to have nodearrow-up-right installed (to use npx)

    Configuration Example

    Cursor

    Claude Code

    hashtag
    API Key

    If your agent doesn’t support OAuth, or if you want to connect a service account, this is the way to go.

    hashtag
    Prerequisites

    1. Service‑account API key – create one or use an existing API Key. Learn more at groundcover API keysarrow-up-right.

    2. Your local time zone (IANA format, for example America/New_York or Asia/Jerusalem). See how to find it below.

    hashtag
    Configuration Example

    Parameters

    • AUTH_HEADER – your groundcover API key.

    • TIMEZONE – your time zone in IANA format.

    hashtag
    Have a Multi‑backend setup?

    If you're using a multi-backend setup (OAuth or API Key), just add the following header to the args list:

    First, grab your backend ID (it’s basically the name):

    1. Open Data Explorer in groundcover.

    2. Click the Backend picker (top‑right) and copy the backend’s name.

    hashtag
    How to find your time zone

    OS
    command

    macOS

    sudo systemsetup -gettimezone

    Linux

    timedatectl grep "Time zone"

    Windows PowerShell

    Get-TimeZone

    hashtag
    Client‑specific Guides

    Depending on your client, you can usually set up the MCP server through the UI - or just ask the client to add it for you. Here are quick links for common tools:

    • Instructions for Claude Desktoparrow-up-right

    • Instructions for Claude Webarrow-up-right

    • Instructions for Cursorarrow-up-right

    OAuth
    API Key
    groundcover may work on many other linux kernels, but we might just didn't get a chance to test it yet. Can't find yours in the list? let us know over Slack.arrow-up-right

    hashtag
    Kernel Version

    Version v5.3 or higher (anything since 2020).

    hashtag
    Linux Distributions

    Name
    Supported Versions

    Debian

    11+

    RedHat Enterprise Linux

    8.2+

    Ubuntu

    20.10+

    CentOS

    7.3+

    Fedora

    31+

    BottlerocketOS

    1.10+

    hashtag
    Permissions

    Loading eBPF code requires running privileged containers. While this might seem unusual, there's nothing to worry about - eBPF is safe by design!arrow-up-right

    hashtag
    CO:RE support

    Our sensor uses eBPF’s CO:REarrow-up-right feature in order to support the vast variety of linux kernels and distributions detailed above. This feature requires the kernel to be compiled with BTF information (enabled using the CONFIG_BTF_ENABLE=Y kernel compilation flag). This is the case for most common distributionsarrow-up-right nowadays.

    You can check if your kernel has CO:RE support by manually looking for the BTF file:

    If the file exists, congratulations! Your kernel supported CO:RE.

    hashtag
    What happens if my kernel is not supported?

    If your system does not fit into any of the above - unfortunately, our eBPF sensor will not be able to run on your environment. However, this does not mean groundcover won’t collect any data. You will still be able to inspect your k8s environment, see all collected logs and use integrations with outer data sources.

    hashtag
    Prerequisites
    • A groundcover account with permissions to create/edit Dashboards

    • A Terraform environment (groundcover provider >v1.1.1)

    • The groundcover Terraform provider configured with your API credentials

    See also: groundcover Terraform provider reference for provider configuration and authentication details.


    hashtag
    1) Creating a Dashboard via Terraform

    hashtag
    1.1) Create a dashboard directly from the UI

    In order to create a dashboard using Terraform you first need to create a dashboard manually in order to export it in a Terraform format.

    See Creating dashboards to learn more.

    hashtag
    1.2) Export the dashboard in Terraform format

    You can export a Dashboard into as a Terraform resource:

    1. Open the Dashboard.

    2. Click Actions → Export.

    3. Download or copy the Terraform tab’s content and paste it into your .tf file (see placeholder above).

    hashtag
    1.3) Add the dashboard resource to your Terraform configuration

    circle-info

    The example below is a placeholder, paste your generated snippet or hand‑write your own.

    After saving this file as main.tf along with the provider details, type:


    hashtag
    2) Managing existing provisioned Dashboard

    hashtag
    2.1) "Provisioned" badge for IaC‑managed Dashboards

    Dashboards added via Terraform are marked as Provisioned in the UI so you can quickly distinguish IaC‑managed Dashboards from manually created ones, both from the Dashboard List and inside the Dashboard itself.

    hashtag
    2.2) Edit behavior for Provisioned Dashboards

    Provisioned Dashboards are read‑only by default to protect the source of truth in your Terraform code.

    • To make a quick change, click Unlock dashboard. This allows editing directly in the UI, all changes are automatically saved as always.\

    • Important: Any changes can be overwritten the next time your provisioner runs terraform apply.

    • Safer alternative: Duplicate the Dashboard and edit the copy, then migrate those changes back into code.

    hashtag
    2.3) Editing dashboards via Terraform

    Changing the resource and reapplying Terraform willupdate the Dashboard in groundcover.

    Deleting the resource from your code (and applying) will delete it from groundcover.

    See more examples on our Github repoarrow-up-right.


    hashtag
    3) Importing existing Dashboards into Terraform

    Already have a Dashboard in groundcover? Bring it under Terraform management without recreating it:

    After importing, run terraform plan to view the state and align your config with what exists.


    hashtag
    Reference

    • Creating dashboars – how to build widgets and layouts in the UI

    • groundcover Terraform provider documentation

    • groundcover Terraform provider Github repoarrow-up-right – resource schema, arguments, and examples

    RUM

    Send Real‑User‑Monitoring events using JS snippet embedded in web pages

    Third Party

    Integrate 3rd-party data sources that push data (e.g. OpenTelemtry, AWS Firehose, FluentBit, etc.)

    *Only the Sensor has limited read capability in order to support pulling remote configuration such as OTTL parsing rulesarrow-up-right applied from the UI. RUM and Third Party have write-only configurations.


    hashtag
    Creating an Ingestion Key

    It is recommended to create a dedicated Ingestion Key for every data source, so that they can be managed and rotated appropriately, minimize exposure or risk, and allow groundcover to identify the datasource of all the ingested data.

    1. Open Settings → Access → Ingestion Keys and click Create key.

    2. Give the key a clear, descriptive Name (for example k8s-prod‑eu‑central‑1).

    3. Select the Type that matches your integration.

    4. Click Click & Copy Key.

      1. Unlike API Keys, Ingestion Keys stay visible on the page. Treat every reveal as sensitive and follow the same secret‑handling practices.

    5. Store they Key securely, and continue to integrate your data source.


    hashtag
    Using an Ingestion Key

    hashtag
    Kubernetes sensor example

    hashtag
    OpenTelemetry integration (OTel/HTTP) example


    hashtag
    Viewing keys

    The Ingestion Keys table lets you:

    • Reveal the key at any time.

    • See who created the key and when.

    • Sort by Type or Creator to locate specific credentials quickly.


    hashtag
    Revoking a key

    Click ⋮ → Revoke next to the key. Revocation permanently deletes the key, unlike API Keys which only disables it:

    • The key will disappear from the list.

    • Any service using it will receive 403 / PERMISSION_DENIED and will not be able to continue to send data or pull latest configurations.

    This operation cannot be undone — create a new key and update your deployments if you need access again.


    hashtag
    Ingestion Keys vs. API Keys

    Ingestion Key

    API Key

    Primary purpose

    Write data (ingest)

    Read data / manage resources via REST

    Permissions capabilities

    Write‑only + optional remote‑config read

    Mirrors service‑account RBAC

    Visibility after creation

    Always revealable

    Shown once only

    Typical lifetime

    Tied to integration lifecycle

    Rotated for CI/CD automations


    hashtag
    Best Practices

    • One key per integration – simplifies rotation and blast radius.

    • Store securely – AWS Secrets Manager, GCP Secret Manager, HashiCorp Vault, Kubernetes Secrets.

    • Rotate regularly – create a new key, roll it out, then revoke the old one.

    • Monitor for 403 errors – a spike usually means a revoked or expired key.


    Sensor*

    Install the eBPF sensor on Kubernetes or Hosts/VMs

    We currently support running on eBPF-enabled linux machines (See more Kernel requirements for eBPF sensor)

    Supported architectures: AMD64 + ARM64

    For the following providers, we will fetch the machine metadata from the provider's API.

    Provider
    Supported

    AWS

    ✅

    GCP

    ✅

    Azure

    ✅

    Linode

    ✅

    hashtag
    Sensor capabilities

    • Infrastructure Host metrics: CPU/Memory/Disk usage

    • Logs

      • Natively from docker containers running on the machine

      • JournalD ()

      • Static log files on the machine ()

    • Traces

      • Natively from docker containers running on the machine

    • APM metrics and insights from the traces

    hashtag
    How to install?

    Installation currently requires running a script on the machine.

    The script will pull the latest sensor version and install it as a service: groundcover-sensor (requires privileges)

    hashtag
    Install/Upgrade existing sensor:

    Where:

    • {ingestion_Key} - A dedicated ingestion key, you can generate or find existing ones from Settings -> Access -> Ingestion Keys

      • Ingestion Key needs to be of Type Sensor

    • {BYOC_Endpoint} - Your BYOC public ingestion endpoint

    • {selected_Env} - The Environment that will group those machines on the cluster drop down in the top right corner (We recommend setting a separate one for non k8s deployments)

    hashtag
    Check installed sensor status:

    • Check service status: systemctl status groundcover-sensor

    • View sensor logs: journalctl -u groundcover-sensor

    circle-info

    Initial data may take a few minutes to appear in the app after installation

    hashtag
    Remove installed sensor:

    hashtag
    Customize sensor configuration:

    The sensor supports overriding its default configuration by writing to the file is located in:

    /etc/opt/groundcover/overrides.yaml. After writing it you should restart the sensor service using:

    systemctl restart groundcover-sensor

    Example 1 - override Docker max log line size:

    Example 2 - add static labels to your metrics:

    pricing pagearrow-up-right

    hashtag
    Headers

    Header
    Required
    Description

    Authorization

    Yes

    Bearer token with your API key

    X-Backend-Id

    Yes

    Your backend identifier

    Content-Type

    Yes

    Must be application/json

    Accept

    Yes

    Must be application/json

    hashtag
    Request Body

    Parameter
    Type
    Required
    Description

    sources

    Array

    No

    Filter by data sources (empty array for all sources)

    start

    String

    Yes

    Start timestamp in ISO 8601 format (UTC)

    end

    String

    Yes

    hashtag
    Time Range Parameters

    • Format: ISO 8601 format with milliseconds: YYYY-MM-DDTHH:mm:ss.sssZ

    • Timezone: All timestamps must be in UTC (denoted by 'Z' suffix)

    hashtag
    Response

    The response contains an array of namespaces for the specified time period.

    hashtag
    Response Fields

    Field
    Type
    Description

    namespaces

    Array

    Array of namespace names or namespace objects

    hashtag
    Examples

    hashtag
    Basic Request

    hashtag
    Response Example

    hashtag
    Time Range Usage

    hashtag
    Last 24 Hours

    Infrastructure Monitoring

    Get complete visibility into your cloud infrastructure performance at any scale, easily access all your metrics in one place and optimize infrastructure efficiency.

    hashtag
    Overview

    The groundcover platform offers infrastructure monitoring capabilities that were built for cloud-native environments. It enables you to track the
health and efficiency of your infrastructure instantly, with an effortless deployment process.

    Troubleshoot efficiently - acting as a centralized hub for all your infrastructure, application and customer metrics allows you to query, correlate and troubleshoot your cloud environments using real time data and insight on your entire stack.

    Store it all, without a sweat - store any metrics volume without worrying about cardinality or retention limits. remain unaffected by the granularity of metrics you store or query.

    hashtag
    Collection

    groundcover's proprietary eBPF sensor leverages all its innovative powers to collect comprehensive data across your cloud environments without the burden of performance overhead. This data is sourced from various Kubernetes components, including kube-system workloads, cluster information via the Kubernetes API, and the applications' interactions with the Kubernetes infrastructure. This level of detailed collection at the kernel level enables the ability to provide actionable insights into the health of your Kubernetes clusters, which are indispensable for troubleshooting existing issues and taking proactive steps to future-proof your cloud environments.

    hashtag
    Configuration

    You also have the option to for your metrics in the VictoriaMetrics database. By default, logs are retained for 7 days, but you can adjust this period to your preferences.

    hashtag
    Enrichment

    Beyond collecting data, groundcover's methodology involves a strategic layer of data enrichment that seeks to correlate Kubernetes metrics with application performance indicators. This correlation is crucial for creating a transparent image of the Kubernetes ecosystem. It enables a deep understanding of how Kubernetes interacts with applications, identifying across the interconnected environment. By monitoring Kubernetes not as an isolated platform but as an integral part of the application infrastructure, groundcover ensures that the monitoring strategy aligns with your dynamic and complex cloud operations.

    hashtag
    Infrastructure Metrics

    Monitoring a cluster involves tracking resources that are critical to the performance and stability of the entire system. Monitoring these essential metrics is crucial for maintaining a healthy Kubernetes cluster:

    • CPU consumption: It's essential to track the CPU resources being utilized against the total capacity to prevent workloads from failing due to insufficient CPU availability.

    • Memory utilization: Keeping an eye on the remaining memory resources ensures that your cluster doesn't encounter disruptions due to memory shortages.

    • Disk space allocation: For Kubernetes clusters running stateful applications or requiring persistent storage for data, such as etcd databases, tracking the available disk space is crucial to avert potential storage deficiencies.

    hashtag
    Container CPU and Memory

    Available Labels

    type

    clusterId region namespace node_name workload_name

    pod_name container_name container_image

    Available Metrics

    Name
    Description
    Type

    hashtag
    Node CPU, Memory and Disk

    Available Labels

    type clusterId region node_name

    Available Metrics

    Name
    Description
    Type

    hashtag
    PVC Usage

    Available Labels

    type clusterId region name namespace

    Available Metrics

    Name
    Description
    Type

    hashtag
    Network Usage

    Available Labels

    clusterId workload_name namespace container_name remote_service_name remote_namespace remote_is_external availability_zone region remote_availability_zone remote_region is_cross_az

    Notes:

    • is_loopback and remote_is_external are special labels that indicate the remote service is either the same service as the recording side (loopback) or resides in an external network, e.g managed service outside of the cluster (external).

      • In both cases the remote_service_name and the remote_namespace labels will be empty

    Available Metrics

    Name
    Description
    Type

    List Monitors

    Get a list of all configured monitors in the system with their identifiers, titles, and types.

    hashtag
    Endpoint

    POST /api/monitors/list

    hashtag
    Authentication

    This endpoint requires API Key authentication via the Authorization header.

    hashtag
    Headers

    Header
    Required
    Description

    hashtag
    Request Body

    The request body supports filtering by sources:

    Parameters

    Parameter
    Type
    Required
    Description

    hashtag
    Response

    hashtag
    Response Schema

    Field Descriptions

    Field
    Type
    Description

    Monitor Types

    Type
    Description

    hashtag
    Examples

    hashtag
    Basic Request

    Get all monitors:

    hashtag
    Response Example

    Create Ingestion Key

    Create a new ingestion key.

    hashtag
    Endpoint

    POST /api/rbac/ingestion-keys/create

    hashtag
    Authentication

    This endpoint requires API Key authentication via the Authorization header.

    hashtag
    Headers

    hashtag
    Request Body

    Required and optional fields for creating an ingestion key:

    Parameter
    Type
    Required
    Description

    hashtag
    Examples

    hashtag
    Create Basic Sensor Key

    hashtag
    Create Third-Party Integration Key

    hashtag
    Create RUM Key with Configuration

    hashtag
    Response

    hashtag
    Response Schema

    hashtag
    Response Example

    hashtag
    Key Types

    Type
    Description
    Default remoteConfig

    hashtag
    Verification

    To verify the key was created successfully, use the List Ingestion Keys endpoint:

    hashtag
    Naming Requirements

    • Names must be lowercase with hyphens as separators

    • No capital letters, spaces, or special characters (except hyphens)

    • Examples of valid names: production-k8s-sensor, otel-staging-api, rum-frontend

    hashtag
    Related Documentation

    For comprehensive information about ingestion keys, including usage and management, see:

    Get Monitor

    Retrieve detailed configuration for a specific monitor by its UUID, including queries, thresholds, display settings, and evaluation parameters.

    hashtag
    Endpoint

    GET /api/monitors/{uuid}

    hashtag
    Authentication

    This endpoint requires API Key authentication via the Authorization header.

    hashtag
    Headers

    Header
    Required
    Description

    hashtag
    Path Parameters

    Parameter
    Type
    Required
    Description

    Field Descriptions

    Field
    Type
    Description

    hashtag
    Examples

    hashtag
    Basic Request

    Get monitor configuration by UUID:

    hashtag
    Response Example - Metrics Monitor

    Email via Zapier

    This guide shows how to route groundcover alerts to email using Zapierarrow-up-right. Since groundcover supports webhook-based alerting, and Zapier can receive webhooks and send emails, you can easily set up this workflow without writing code.


    hashtag
    Prerequisites

    • A groundcover account with access to the Workflows tab.

    • A Zapier account (free plan is sufficient).

    • An email address where you want to receive alerts.


    hashtag
    Step 1: Create a Webhook Integration in groundcover

    1. Go to Settings → Integrations.

    2. Click Create Integration.

    3. Choose Webhook as the type.


    hashtag
    Step 2: Create a Zapier Webhook-to-Email Workflow

    hashtag
    Create a Webhook Trigger

    1. Go to .

    2. Click "Create Zap".

    3. Set Trigger:


    hashtag
    Configure the Email Step

    1. Set Action:

      • App: Email by Zapier

      • Event: Send Outbound Email


    hashtag
    Step 3: Create a Workflow in groundcover

    1. Go to the Workflows section in your groundcover.

    2. Create a Notification Workflow with the integration we created in step 1.

    3. Edit the worflow YAML and use the following structure:


    hashtag
    Step 4: Test the Flow

    1. Trigger a test alert in groundcover.

    2. Check Zapier to ensure the webhook was received.

    3. Confirm the email arrives with the right content.

    Synthetics

    Synthetics allow you to proactively monitor the health, availability, and performance of your endpoints. By simulating requests from your infrastructure, you can verify that your critical services are reachable and returning correct data, even when no real user traffic is active.

    hashtag
    Overview

    groundcover Synthetics execute checks from your installed groundcover backend, working on BYOC deployments only.

    Creating dashboards

    circle-info

    Note: Only users with Write or Admin permissions can create and edit dashboards.

    hashtag
    How to create a new Dashboard in groundcover?

    Workflow Examples

    This page provides practical examples of workflows for different use cases and integrations.

    hashtag
    Triggers Examples

    hashtag
    Filter by Monitor Name

    List Ingestion Keys

    Get a list of ingestion keys with optional filtering by name, type, and remote configuration status.

    hashtag
    Endpoint

    POST /api/rbac/ingestion-keys/list

    hashtag

    SQL Based Monitors

    Sometimes there are use cases that involve complex queries and conditions for triggering a monitor. This might go beyond the built-in query logic that is provided within the groundcover logs page query language.

    An example for such a use case could be the need to compare some logs to the same ones in a past period. This is not something that is regularly available for log search but can definitely be something to alert on. If the number of errors for a group of logs dramatically changes from a previous week, this could be an event to alert and investigate.

    For such use cases you can harness the powerful ClickHouse SQL language to create an SQL based monitor within groundcover.

    hashtag

    Create a Grafana alert

    Alerts in groundcover leverage a fully integrated Grafana interface. To learn how you can create alerts using Grafana Terraform, follow .

    hashtag
    Setup an alert based on metrics

    circle-info

    groundcover Terraform Provider

    hashtag
    Overview

    Terraform is an infrastructure-as-code (IaC) tool for managing cloud and SaaS resources using declarative configuration. The groundcover Terraform provider enables you to manage observability resources such as policies, service accounts, API keys, and monitors as code—making them consistent, versioned, and automated.

    We've partnered up with Hashicorp as an official Terraform provider-

    Also available is our provider Github repository:

    Saved Views

    Save the view of any groundcover page exactly the way you like it, then jump back in a click.

    A Saved View captures your current page layout: filters, columns, toggles, etc.. so you and your team can reopen the page with the same context every time. Each groundcover page maintains its own catalogue of views, and every user can pick their personal Favorites.


    hashtag
    Where to find them

    On the pages: Traces, Logs, API Catalog, Events.

    Look for the Views selector next to the time‑picker. Click it to open the list, create a new view, or switch between existing ones.

    Role-Based Access Control (RBAC)

    circle-info

    This capability is only available to organizations subscribed to our .

    Role-Based Access Control (RBAC) in groundcover gives you a flexible way to manage who can access certain features and data in the platform. By defining both default roles and policies, you ensure each team member only sees and does what their level of access permits. This approach strengthens security and simplifies onboarding, allowing administrators to confidently grant or limit access.

    Notification Routes

    Notification Routes let you automatically send notifications to Connected Apps when monitor issues change state.

    circle-info

    Creating and editing Notification Routes requires editor privileges in groundcover

    hashtag
    How It Works

    Backup & Restore Metrics

    Learn how to backup and restore metrics into groundcover metrics storage

    groundcover uses as its underlying metrics storage solution. As such, groundcover integrates seamlessly with VictoriaMetrics and tools.

    hashtag
    Doing incremental backups

    Connect Kubernetes clusters

    Get up and running in minutes in Kubernetes

    Before installing groundcover in Kubernetes, please make sure your cluster meets the .

    After ensuring your cluster meets the requirements, complete the , then choose your preferred installation method:

    {
      "mcpServers": {
        "groundcover": {
          "command": "npx",
          "args": [
            "-y",
            "[email protected]",
            "https://mcp.groundcover.com/api/mcp",
            "54278",
            "--header",
            "X-Timezone:<IANA_TIMEZONE>",
            "--header",
            "X-Tenant-Uuid:<TENANT_UUID>"
          ]
        }
      }
    }
    claude mcp add groundcover npx -- -y [email protected] https://mcp.groundcover.com/api/mcp 54278 --header X-Timezone:<IANA_TIMEZONE> --header X-Tenant-UUID:<TENANT_UUID>
    {
      "mcpServers": {
        "groundcover": {
          "command": "npx",
          "args": [
            "-y",
            "mcp-remote",
            "https://mcp.groundcover.com/api/mcp",
            "--header", "Authorization:${AUTH_HEADER}",
            "--header", "X-Timezone:${TIMEZONE}"
          ],
          "env": {
            "AUTH_HEADER": "Bearer <your_token>",
            "TIMEZONE": "<your_timezone>"
          }
        }
      }
    }
    "--header", "X-Backend-Id:<BACKEND_ID>"
    $ ls -la /sys/kernel/btf/vmlinux
    
    - r--r--r--. 1 root root 3541561 Jun 2 18:16 /sys/kernel/btf/vmlinux
    resource "groundcover_dashboard" "llm_observability" {
      name             = "LLM Observability"
      description      = "Dashboard to monitor OpenAI and Anthropic usage"
      preset           = "{\"widgets\":[{\"id\":\"B\",\"type\":\"widget\",\"name\":\"Total LLM Calls\",\"queries\":[{\"id\":\"A\",\"expr\":\"span_type:openai span_type:anthropic | stats by(span_type) count() count_all_result | sort by (count_all_result desc) | limit 5\",\"dataType\":\"traces\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"stat\"}},{\"id\":\"D\",\"type\":\"widget\",\"name\":\"LLM Calls Rate\",\"queries\":[{\"id\":\"A\",\"expr\":\"sum(rate(groundcover_resource_total_counter{type=~\\\"openai|anthropic\\\",status_code=\\\"ok\\\"})) by (gen_ai_request_model)\",\"dataType\":\"metrics\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"time-series\",\"selectedChartType\":\"stackedBar\"}},{\"id\":\"E\",\"type\":\"widget\",\"name\":\"Average LLM Response Time\",\"queries\":[{\"id\":\"A\",\"expr\":\"avg(groundcover_resource_latency_seconds{type=~\\\"openai|anthropic\\\"}) by (type)\",\"dataType\":\"metrics\",\"step\":\"disabled\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"stat\",\"step\":\"disabled\",\"selectedUnit\":\"Seconds\"}},{\"id\":\"A\",\"type\":\"widget\",\"name\":\"Total LLM Tokens Used\",\"queries\":[{\"id\":\"A\",\"expr\":\"span_type:openai span_type:anthropic | stats by(span_type) sum(gen_ai.response.usage.total_tokens) sum_result | sort by (sum_result desc) | limit 5\",\"dataType\":\"traces\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"stat\",\"step\":\"disabled\"}},{\"id\":\"C\",\"type\":\"widget\",\"name\":\"AVG Input Tokens Per LLM Call \",\"queries\":[{\"id\":\"A\",\"expr\":\"span_type:openai OR span_type:anthropic | stats by(span_type) avg(gen_ai.response.usage.input_tokens) avg_result | sort by (avg_result desc) | limit 5\",\"dataType\":\"traces\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"stat\"}},{\"id\":\"F\",\"type\":\"widget\",\"name\":\"AVG Output Tokens Per LLM Call \",\"queries\":[{\"id\":\"A\",\"expr\":\"span_type:openai OR span_type:anthropic | stats by(span_type) avg(gen_ai.response.usage.output_tokens) avg_result | sort by (avg_result desc) | limit 5\",\"dataType\":\"traces\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"stat\",\"step\":\"disabled\"}},{\"id\":\"G\",\"type\":\"widget\",\"name\":\"Top Used Models\",\"queries\":[{\"id\":\"A\",\"expr\":\"span_type:openai OR span_type:anthropic | stats by(gen_ai.request.model) count() count_all_result | sort by (count_all_result desc) | limit 100\",\"dataType\":\"traces\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"bar\",\"step\":\"disabled\"}},{\"id\":\"H\",\"type\":\"widget\",\"name\":\"Total LLM Errors \",\"queries\":[{\"id\":\"A\",\"expr\":\"(span_type:openai OR span_type:anthropic) status:error | stats by(span_type) count() count_all_result | sort by (count_all_result desc) | limit 1\",\"dataType\":\"traces\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"stat\"}},{\"id\":\"I\",\"type\":\"widget\",\"name\":\"AVG TTFT Over Time by Model\",\"queries\":[{\"id\":\"A\",\"expr\":\"avg(groundcover_workload_latency_seconds{gen_ai_system=~\\\"openai|anthropic\\\",quantile=\\\"0.50\\\"}) by (gen_ai_request_model)\",\"dataType\":\"metrics\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"time-series\",\"selectedChartType\":\"line\",\"selectedUnit\":\"Seconds\"}},{\"id\":\"J\",\"type\":\"widget\",\"name\":\"Avg Output Tokens Per Second by Model\",\"queries\":[{\"id\":\"A\",\"expr\":\"avg(groundcover_gen_ai_response_usage_output_tokens{}) by (gen_ai_request_model)\",\"dataType\":\"metrics\",\"editorMode\":\"builder\"},{\"id\":\"B\",\"expr\":\"avg(groundcover_workload_latency_seconds{quantile=\\\"0.50\\\"}) by (gen_ai_request_model)\",\"dataType\":\"metrics\",\"editorMode\":\"builder\"},{\"id\":\"formula-A\",\"expr\":\"A / B\",\"dataType\":\"metrics-formula\",\"editorMode\":\"builder\"}],\"visualizationConfig\":{\"type\":\"time-series\",\"selectedUnit\":\"Number\"}}],\"layout\":[{\"id\":\"B\",\"x\":0,\"y\":0,\"w\":4,\"h\":6,\"minH\":4},{\"id\":\"D\",\"x\":0,\"y\":30,\"w\":24,\"h\":6,\"minH\":4},{\"id\":\"E\",\"x\":8,\"y\":0,\"w\":8,\"h\":6,\"minH\":4},{\"id\":\"A\",\"x\":16,\"y\":0,\"w\":8,\"h\":6,\"minH\":4},{\"id\":\"C\",\"x\":0,\"y\":12,\"w\":8,\"h\":6,\"minH\":4},{\"id\":\"F\",\"x\":8,\"y\":24,\"w\":8,\"h\":6,\"minH\":4},{\"id\":\"G\",\"x\":16,\"y\":24,\"w\":8,\"h\":6,\"minH\":4},{\"id\":\"H\",\"x\":4,\"y\":0,\"w\":4,\"h\":6,\"minH\":4},{\"id\":\"I\",\"x\":0,\"y\":18,\"w\":24,\"h\":6,\"minH\":4},{\"id\":\"J\",\"x\":0,\"y\":3,\"w\":24,\"h\":6,\"minH\":4}],\"duration\":\"Last 15 minutes\",\"variables\":{},\"spec\":{\"layoutType\":\"ordered\"},\"schemaVersion\":4}"
    }
    terraform plan
    terraform apply
    # Syntax
    terraform import groundcover_dashboard.<local_name> <dashboard_id>
    
    # Example
    terraform import groundcover_dashboard.service_overview dsh_1234567890
    helm upgrade --install groundcover groundcover/groundcover \
      --set global.groundcover_token=<INGESTION_KEY>,clusterId={cluster-name}
    exporters:
      otlphttp/groundcover:
        endpoint: https://{GROUNDCOVER_MANAGED_OPENTELEMETRY_ENDPOINT}
        headers: 
          apikey: {INGESTION_KEY}
    
    pipelines:
      traces:
        exporters:
        - otlphttp/groundcover
    curl -fsSL https://groundcover.com/install-groundcover-sensor.sh | sudo env API_KEY='{ingestion_Key}' GC_ENV_NAME='{selected_Env}' GC_DOMAIN='{BYOC_ENDPOINT}' bash -s -- install
    curl -fsSL https://groundcover.com/install-groundcover-sensor.sh | sudo bash -s -- uninstall
    echo "# Local overrides to sensor configuration
    k8sLogs:
      scraper:
        dockerMaxLogSize: 102400
    " | sudo tee /etc/opt/groundcover/overrides.yaml && sudo systemctl restart groundcover-sensor
    // add custom static labels to metrics
    pipelines:
      metrics:
        additionalMetricLabels:
          label1: "label1_value"
          label2: "label2_value"
    POST /api/k8s/v2/namespaces/list
    curl 'https://api.groundcover.com/api/k8s/v2/namespaces/list' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'content-type: application/json' \
      -H 'X-Backend-Id: <YOUR_BACKEND_ID>' \
      --data-raw '{"sources":[],"start":"2025-01-24T06:00:00.000Z","end":"2025-01-24T08:00:00.000Z"}'
    {
      "namespaces": [
        "groundcover",
        "monitoring",
        "kube-system",
        "default"
      ]
    }
    # Get current time and subtract 24 hours for start time
    start_time=$(date -u -v-24H '+%Y-%m-%dT%H:%M:%S.000Z')
    end_time=$(date -u '+%Y-%m-%dT%H:%M:%S.000Z')
    
    curl 'https://api.groundcover.com/api/k8s/v2/namespaces/list' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'content-type: application/json' \
      -H 'X-Backend-Id: <YOUR_BACKEND_ID>' \
      --data-raw "{\"sources\":[],\"start\":\"$start_time\",\"end\":\"$end_time\"}"

    Amazon Linux

    All off the shelf AMIs

    Google COS

    All off the shelf AMIs

    Azure Linux

    All off the shelf AMIs

    Talos

    1.7.3+

    Revocation effect

    Data stops flowing immediately

    API calls fail

    End timestamp in ISO 8601 format (UTC)

    requires configurationarrow-up-right
    requires configurationarrow-up-right
    integrations_readarrow-up-right
    Instructions for Windsurfarrow-up-right
    Instructions for VS Codearrow-up-right
  • Network usage: Visualize traffic rates and connections being established and closed on a service-to-service level of granularity, and easily pinpoint cross availability zone communication to investigate misconfigurations and surging costs.

  • groundcover_container_memory_rss_bytes

    current memory RSS (B)

    Gauge

    groundcover_container_memory_request_bytes

    K8s container memory request (B)

    Gauge

    groundcover_container_memory_limit_bytes

    K8s container memory limit (B)

    Gauge

    groundcover_container_cpu_delay_seconds

    K8s container CPU delay accounting in seconds

    Counter

    groundcover_container_disk_delay_seconds

    K8s container disk delay accounting in seconds

    Counter

    groundcover_container_cpu_throttled_seconds_total

    K8s container total CPU throttling in seconds

    Counter

    groundcover_node_free_disk_space

    amount of free disk space in current node (B)

    Gauge

    groundcover_node_total_disk_space

    amount of total disk space in current node (B)

    Gauge

    groundcover_node_used_percent_disk_space

    percent of used disk space in current node (0-100)

    Gauge
    protocol
    role
    server_port
    encryption
    transport_protocol
    is_loopback

    is_cross_az means the traffic was sent and/or received between two different availability zones. This is a helpful flag to quickly identify this special kind of communication.

    • The actual zones are detailed in the availability_zone and remote_availability_zone labels

    groundcover_network_connections_opened_failed_total

    Connections attempts failed per workload (including refused connections)

    Counter

    groundcover_network_connections_opened_refused_total

    Connections attempts refused per workload

    Counter

    groundcover_container_cpu_usage_rate_millis

    CPU usage in mCPU

    Gauge

    groundcover_container_cpu_request_m_cpu

    K8s container CPU request (mCPU)

    Gauge

    groundcover_container_cpu_limit_m_cpu

    K8s container CPU limit (mCPU)

    Gauge

    groundcover_container_memory_working_set_bytes

    current memory working set (B)

    groundcover_node_allocatable_cpum_cpu

    amount of allocatable CPU in the current node (mCPU)

    Gauge

    groundcover_node_allocatable_mem_bytes

    amount of allocatable memory in the current node (B)

    Gauge

    groundcover_node_mem_used_percent

    percent of used memory in current node (0-100)

    Gauge

    groundcover_node_used_disk_space

    current used disk space in current node (B)

    groundcover_pvc_usage_bytes

    PVC used bytes (B)

    Gauge

    groundcover_pvc_capacity_bytes

    PVC capacity bytes (B)

    Gauge

    groundcover_pvc_available_bytes

    PVC available bytes (B)

    Gauge

    groundcover_pvc_usage_percent

    percent of used pvc storage (0-100)

    groundcover_network_rx_bytes_total

    Bytes received by the workload (B)

    Counter

    groundcover_network_tx_bytes_total

    Bytes sent by the workload (B)

    Counter

    groundcover_network_connections_opened_total

    Connections opened by the workload

    Counter

    groundcover_network_connections_closed_total

    Connections closed by the workload

    Counter
    Your subscription costsarrow-up-right
    define the retention period
    potential points of failure
    Gauge
    Gauge
    Gauge

    Authorization

    Yes

    Bearer token with your API key

    Content-Type

    Yes

    Must be application/json

    Accept

    Yes

    Must be application/json

    sources

    array

    No

    Source filters (empty array returns all monitors)

    monitors

    array

    Array of monitor objects

    uuid

    string

    Unique identifier for the monitor

    title

    string

    Monitor name/description

    type

    string

    Monitor type (see monitor types below)

    "metrics"

    Metrics-based monitoring

    "traces"

    Distributed tracing monitoring

    "logs"

    Log-based monitoring

    "events"

    Event-based monitoring

    "infra"

    Infrastructure monitoring

    ""

    (empty string) General/unspecified monitoring

  • Examples of invalid names: Production-K8s, OTEL_API, rum frontend

  • name

    string

    Yes

    Unique name for the ingestion key (must be lowercase with hyphens)

    type

    string

    Yes

    Key type ("sensor", "thirdParty", "rum")

    tags

    array

    No

    "sensor"

    Keys for groundcover sensors and agents

    true

    "thirdParty"

    Keys for third-party integrations (OpenTelemetry, etc.)

    false

    "rum"

    Keys for Real User Monitoring data ingestion

    false

    Ingestion Keysarrow-up-right

    Array of tags to associate with the key

    display.description

    string

    Monitor description

    severity

    string

    Alert severity level (e.g., "S1", "S2", "S3")

    measurementType

    string

    Type of measurement ("state", "event")

    model.queries

    array

    Query configurations for data retrieval

    model.thresholds

    array

    Threshold configurations for alerting

    executionErrorState

    string

    State when execution fails ("OK", "ALERTING")

    noDataState

    string

    State when no data is available ("OK", "ALERTING")

    evaluationInterval.interval

    string

    How often to evaluate the monitor

    evaluationInterval.pendingFor

    string

    How long to wait before alerting

    isPaused

    boolean

    Whether the monitor is currently paused

    Authorization

    Yes

    Bearer token with your API key

    Content-Type

    Yes

    Must be application/json

    Accept

    Yes

    Must be application/json

    uuid

    string

    Yes

    The unique identifier of the monitor to retrieve

    title

    string

    Monitor name/title

    display.header

    string

    Alert header template with variable substitution

    display.resourceHeaderLabels

    array

    Labels shown in resource headers

    display.contextHeaderLabels

    array

    Labels shown in context headers

    Enter a name like zapier_email_integration.
  • Paste your Zapier Catch Hook URL (you’ll get this in Step 2 below).

  • Save the integration.

  • App: Webhooks by Zapier
  • Event: Catch Hook

  • Copy the Webhook URL (e.g. https://hooks.zapier.com/hooks/catch/...) – you'll use this in groundcover.

  • Configure the email:

    • To: your email address

    • Subject:

    • Body:

    Zapierarrow-up-right
    This example shows how to create a workflow that only triggers for a specific monitor (by its name):

    hashtag
    Filter by Environment

    Execute only on the Prod environment. The "env" attribute needs to be part of the monitor context attributes (either by using it in the group by section or by explicitly adding it as a context label):

    hashtag
    Filter by Multiple Conditions

    This example shows how to combine multiple filters. In this case it will match events from the prod environment and also monitors that explicitly routed the workflow with the name "actual-name-of-workflow":

    hashtag
    Filter by Regex

    In this case we will use a regular expression to filter on events coming from the groundcover OR monitoring namespaces. Note that any regular expression can be used:

    hashtag
    Consts Examples

    The consts section is the best location to create pre-defined attributes and apply different transformations on the monitor's metadata for formatting the notification messaging.

    hashtag
    Map Severities

    Severities in your notified destination may not match the groundcover predefined severities. By using a dictionary, you can map any groundcover severity value to another, and extract it by using the actual monitor severity. Use the "keep.dictget" function to extract from a dictionary and apply a default in case the value is missing.

    hashtag
    Best Practice for Accessing Monitor Labels

    When accessing a context label via alert.labels, if this label is not transferred within the monitor - the workflow might crash. Best practice to pre-define labels is to declare them in the consts section with a default value, using "keep.dictget" so the value is gracefully pulled from the labels object.

    Note: Label names that are dotted, like "cloud.region" in this example, cannot be referenced in the monitor itself and can only be retrieved using this technique of pulling the value with "keep.dictget" from the alert.labels object.

    hashtag
    Additional Useful Functions

    • keep.dict_pop({{alert.labels}}, "_gc_monitor_id", "_gc_monitor_name", "_gc_severity", "backend_id", "grafana_folder", "_gc_issue_header") - "Clean" a key-value dictionary from some irrelevant values (keys). In this case, the groundcover labels dictionary has some internal keys that you might not want to include in your notification content.

    • keep.join(["a", "b", "c"], ",") - Joins a list of elements into a string using a given delimiter. In this case the output is "a,b,c".

    hashtag
    Action Examples

    hashtag
    Conditional Statements

    Use "if" condition to apply logic on different actions.

    Create a separate block for a firing monitor (a resolved monitor can use different logic to change formatting of the notification):

    "If" statements can include and/or logic for multiple conditions:

    hashtag
    Notification by Specific Hours

    Use the function keep.is_business_hours combined with an "if" statement to trigger an action within specific hours only.

    In this example the action block will execute on Sundays (6) between 20-23 (8pm to 11pm) or on Mondays (0) between 0-1am:

    Scope: A gcQL query that filters which monitors' issues will trigger this route

  • Rules: Define what happens when an issue is in a specific state (Firing or Resolved)

  • Connected Apps: Choose where to send the notification (e.g., Slack Webhook, Pagerduty, etc)

  • hashtag
    Prerequisites

    Before creating notification routes, set up your Connected Apps in Settings → Connected-Apps.

    hashtag
    Create a Notification Route

    1. Go to Monitors → Notification Routes

    2. Click Create Notification Route

    3. Complete the wizard:

    hashtag
    Step 1: Route Name

    Give your route a descriptive name (e.g., prod-critical-alerts, infra-team-notifications).

    hashtag
    Step 2: Scope Monitors

    Define which monitors this route applies to using gcQL on:

    1. The grouping labels defined in the query

    2. The custom labels

    3. The Monitor's metadata such as the name or severity

    Examples:

    • env:prod — All monitors with grouping key 'env' and possible value of 'prod'

    • env:prod AND severity:S1 — Only critical production alerts

    • team:platform — Monitors with a custom label for the platform team

    • *:* — To match all monitors

    hashtag
    Step 3: Rules

    Rules define what happens when a scoped monitor's issue changes state.

    Each rule has:

    • Status: When to trigger — Firing (issue is active) or Resolved (issue cleared)

    • Connected Apps: Where to send the notification

    Example setup:

    • When Firing or Resolved → Send to #prod-alerts Slack channel

    • When Firing (only) → Send to Pagerduty service directory

    Click Add Rule to create multiple rules with different status/destination combinations.

    hashtag
    Re-notification Interval

    Configure how long to wait before sending another notification while an alert is still firing.

    Options: 1m, 5m, 10m, 30m, 1h, 2h, 4h, 8h, 12h, 1d, 2d

    This prevents notification fatigue from long-running alerts.

    hashtag
    Managing Notification Routes

    The Notification Routes page shows all your routes with:

    • Name: Route identifier

    • Scope: The gcQL query defining which monitors are affected

    • Connected Apps: Summary of destinations by type

    • Creator: Who created the route

    hashtag
    Edit, Duplicate, or Delete

    Hover over any row to access the action menu:

    • Edit: Modify the route configuration

    • Duplicate: Create a copy as a starting point

    • Delete: Remove the route

    hashtag
    Example Use Cases

    hashtag
    Route Critical Production Alerts to PagerDuty and Slack

    hashtag
    Separate Routes by Team

    hashtag
    Development Alerts — Firing Only

    Install vmutilsarrow-up-right
  • port-forward groundcover's VictoriaMetrics service object

    • Run the vmbackup utility, in this example we'll set the destination to an AWS S3 bucket, but more providers are supportedarrow-up-right

    circle-info

    vmbackup automatically uses incremental backup strategy if the destination contains an existing backup

    hashtag
    Restoring from backup

    • Scale down VictoriaMetrics statefulSet (VictoriaMetrics must be offline during restorations)

    • Get the VictoriaMetrics PVC name

    • Create the following Kubernetes Job manifest vm-restore.yaml

    circle-info

    Make sure you replace {VICTORIA METRICS PVC NAME} with the fetched pvc name

    • Deploy the job and wait for completion

    • Once completed, scale up groundcover's VictoriaMetrics instance

    VictoriaMetricsarrow-up-right
    vmbackuparrow-up-right
    vmrestorearrow-up-right

    Helm

  • ArgoCD

  • circle-info

    Coverage policy covers all nodes excluding control plane and fargate. See details here.

    hashtag
    Creating helm values file

    Sensor deployment requires installation values similar to these stored in a values.yaml file

    {BYOC_ENDPOINT} is your unique groundcover ingestion endpoint, you can locate it in the ingestion keys tabarrow-up-right.

    hashtag
    Installing using CLI

    Use groundcover CLI to automate the installation process. The main advantages of using this installation method are:

    • Auto-detection of cluster incompatibility issues

    • Tolerations setup automation

    • Tuning of resources according to cluster size

    • Supports passing helm overrides

    • Automated detection of new versions and upgrades suggestions

    Read more herearrow-up-right.

    circle-check

    The CLI will automatically use existing ingestion keys or provision a new one if none exist

    hashtag
    Installing groundcover CLI

    Deploying groundcover using the CLI

    To upgrade groundcover to the latest version, simply re-run the groundcover deploy command with your desired overrides (such as -f values.yaml). The CLI will automatically detect and apply the latest available version during the deployment process.

    hashtag
    Installing using Helm

    hashtag
    Step 1 - Install groundcover CLI

    hashtag
    Step 2 - Generate Installation Key

    For more details about ingestion keys, refer to our ingestion key documentation.

    hashtag
    Step 3 - Add Sensor Ingestion Key to Values File

    Add the recently created sensor key and your BYOC endpoint to the values.yaml file. Find your BYOC endpoint in the ingestion keys tabarrow-up-right.

    hashtag
    Step 4 - Add Helm Repository

    hashtag
    Step 5 - Install groundcover

    Initial installation:

    Upgrade groundcover:

    hashtag
    Installing using ArgoCD

    For CI/CD deployments using ArgoCD, refer to our ArgoCD deployment guide.

    hashtag
    What can you do next?

    Check out our 5 quick steps to get you started

    hashtag
    Uninstalling

    hashtag
    CLI

    hashtag
    Helm

    requirements
    login and workspace setup
    UI
    CLI
    {
      "sources": []
    }
    {
      "monitors": [
        {
          "uuid": "string",
          "title": "string",
          "type": "string"
        }
      ]
    }
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/monitors/list' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --data-raw '{"sources":[]}'
    {
      "monitors": [
        {
          "uuid": "xxxx-xxxx-xxxx-xxxx-xxxx",
          "title": "PVC usage above threshold (90%)",
          "type": "metrics"
        },
        {
          "uuid": "xxxx-xxxx-xxxx-xxxx-xxxx",
          "title": "HTTP API Errors Monitor",
          "type": "traces"
        },
        {
          "uuid": "xxxxx-xxxx-xxxx-xxxx-xxxx",
          "title": "Error Logs Monitor",
          "type": "logs"
        },
        {
          "uuid": "xxxxx-xxxx-xxxx-xxxx-xxxx",
          "title": "Node CPU Usage Average is Above 85%",
          "type": "infra"
        },
        {
          "uuid": "xxxx-xxxx-xxxx-xxxx-xxxx",
          "title": "Rolling Update Triggered",
          "type": "events"
        },
        {
          "uuid": "xxxx-xxxx-xxxx-xxxx-xxxx",
          "title": "Deployment Partially Not Ready - 5m",
          "type": "events"
        }
      ]
    }
    Authorization: Bearer <YOUR_API_KEY>
    Content-Type: application/json
    {
      "name": "string",
      "type": "sensor|thirdParty|rum"
    }
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/rbac/ingestion-keys/create' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --data '{
        "name": "production-k8s-sensor",
        "type": "sensor"
      }'
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/rbac/ingestion-keys/create' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --data '{
        "name": "otel-collector-prod",
        "type": "thirdParty",
        "tags": ["otel", "production"]
      }'
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/rbac/ingestion-keys/create' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --data '{
        "name": "frontend-rum-monitoring",
        "type": "rum",
        "tags": ["rum", "frontend", "web"]
      }'
    {
      "id": "string",
      "name": "string", 
      "createdBy": "string",
      "creationDate": "string",
      "key": "string",
      "type": "string",
      "tags": ["string"]
    }
    {
      "id": "12345678-1234-1234-1234-123456789abc",
      "name": "production-k8s-sensor",
      "createdBy": "[email protected]",
      "creationDate": "2025-08-31T14:09:15Z",
      "key": "gcik_AEBAAAE4_XXXXXXXXX_XXXXXXXXX_XXXXXXXX",
      "type": "sensor",
      "remoteConfig": true,
      "tags": []
    }
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/rbac/ingestion-keys/list' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --data '{
        "name": "production-k8s-sensor"
      }'
    curl -L \
      --url 'https://api.groundcover.com/api/monitors/xxxx-xxxx-xxx-xxxx-xxxx' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Accept: application/json'
    title: 'PVC usage above threshold (90%)'
    display:
      header: PV usage above 90% threshold - {{ alert.labels.cluster }}, {{ alert.labels.name }}
      contextHeaderLabels:
      - cluster
      - namespace
      - env
    severity: S2
    measurementType: state
    model:
      queries:
      - dataType: metrics
        name: threshold_input_query
        pipeline:
          function:
            name: last_over_time
            pipelines:
            - function:
                name: avg_by
                pipelines:
                - metric: groundcover_pvc_usage_percent
                args:
                - cluster
                - env
                - name
                - namespace
            args:
            - 1m
        conditions:
        - key: name
          origin: root
          type: string
          filters:
          - op: not_match
            value: object-storage-cache-groundcover-incloud-clickhouse-shard.*
      thresholds:
      - name: threshold_1
        inputName: threshold_input_query
        operator: gt
        values:
        - 90
    noDataState: OK
    evaluationInterval:
      interval: 1m0s
      pendingFor: 1m0s
    🚨 New groundcover Alert 🚨
    🔔 Alert Title: {{alert_name}}
    💼 Severity: {{severity}}
    
    🔗 Links:
    - 🧹 Issue: {{issue_url}}
    - 📈 Monitor: {{monitor_url}}
    - 🔕 Silence: {{silence_url}}
    workflow:
      id: emails
      name: emails
      description: Sends alerts to Zapier webhook (for email)
      triggers:
      - type: alert
        filters:
        - key: annotations.emails
          value: enabled  
      consts:
        severity: keep.dictget( {{ alert.annotations }}, "_gc_severity", "info")
        title: keep.dictget( {{ alert.annotations }}, "_gc_issue_header", "{{ alert.alertname }}")
        issue_url: https://app.groundcover.com/monitors/issues?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.fingerprint }}
        monitor_url: https://app.groundcover.com/monitors?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.labels._gc_monitor_id }}
        silence_url: https://app.groundcover.com/monitors/create-silence?keep.replace(keep.join({{ consts.redacted_labels }}, "&", "matcher_"), " ", "+")
        redacted_labels: keep.dict_pop({{ alert.labels }}, "_gc_monitor_id", "_gc_monitor_name", "_gc_severity", "backend_id", "grafana_folder", "_gc_issue_header")
      actions:
      - name: <<THE_NAME_OF_YOUR_WORFLOW>>
        provider:
          config: "{{ providers.<<THE_NAME_OF_YOUR_INTEGRATION>> }}"
          type: webhook
          with:
            body:
              alert_name: "{{ consts.title }}"
              severity: "{{ consts.severity }}"
              issue_url: "{{ consts.issue_url }}"
              monitor_url: "{{ consts.monitor_url }}"
              silence_url: "{{ consts.silence_url }}"
    workflow: 
      id: specific-monitor-workflow
      description: Workflow triggered only by Workload Pods Crashed Monitor
      triggers:
        - type: alert
          filters:
            - key: alertname
              value: Workload Pods Crashed Monitor
    workflow: 
      id: prod-only-workflow
      description: Workflow triggered only by production environment alerts
      triggers:
        - type: alert
          filters:
            - key: env
              value: prod
    workflow: 
      id: multi-filter-workflow
      description: Workflow triggered by critical alerts in production
      triggers:
        - type: alert
          filters:
            - key: env
              value: prod
            - key: annotations.multi-filter-workflow
              value: enabled
    workflow: 
      id: regex-filter-workflow
      description: Workflow triggered by alerts from groundcover or monitoring namespaces
      triggers:
        - type: alert
          filters:
            - key: namespace
              value: r"(groundcover|monitoring)"
            - key: annotations.regex-filter-workflow
              value: enabled          
    workflow:
      id: severity-mapping-example
      description: Example of mapping severities using consts
      triggers:
        - type: alert
          filters:
            - key: annotations.severity-mapping-example
              value: enabled            
      consts:
        severities: '{"S1": "P1","S2": "P2","S3": "P3","S4": "P4","critical": "P1","error": "P2","warning": "P3","info": "P4"}'
        severity: keep.dictget({{ consts.severities }}, {{ alert.annotations._gc_severity }}, "P3")
    workflow:
      id: labels-best-practice-example
      description: Example of safely accessing monitor labels
      triggers:
        - type: alert
          filters:
            - key: annotations.labels-best-practice-example
              value: enabled            
      consts:
        region: keep.dictget({{ alert.labels }}, "cloud.region", "")
    workflow:
      id: conditional-actions-example
      description: Example of conditional actions based on alert status
      triggers:
        - type: alert
          filters:
            - key: annotations.conditional-actions-example
              value: enabled                
      actions:
        - if: '{{ alert.status }} == "firing"'
          name: slack-action-firing
          provider:
            config: '{{ providers.groundcover-alerts-dev }}'
            type: slack
            with:
              attachments:
              - color: '{{ consts.red_color }}'
                footer: '{{ consts.footer_url }}'
                footer_icon: '{{ consts.footer_icon }}'
                text: '{{ consts.slack_message }}'
                title: 'Firing: {{ alert.alertname }}'
                title_link: '{{ consts.title_link }}'
                ts: keep.utcnowtimestamp()
                type: plain_text
              message: ' '
    workflow:
      id: multi-condition-actions-example
      description: Example of multiple conditions in actions
      triggers:
        - type: alert
          filters:
            - key: annotations.multi-condition-actions-example
              value: enabled                    
      actions:
        - if: '{{ alert.status }} == "firing" and {{ alert.labels.namespace }} == "namespace1"'
          name: slack-action-firing
          provider:
            config: '{{ providers.groundcover-alerts-dev }}'
            type: slack
            with:
              attachments:
              - color: '{{ consts.red_color }}'
                footer: '{{ consts.footer_url }}'
                footer_icon: '{{ consts.footer_icon }}'
                text: '{{ consts.slack_message }}'
                title: 'Firing: {{ alert.alertname }}'
                title_link: '{{ consts.title_link }}'
                ts: keep.utcnowtimestamp()
                type: plain_text
              message: ' '
    workflow:
      id: time-based-notification-example
      description: Example of time-based conditional actions
      triggers:
        - type: alert
          filters:
            - key: annotations.time-based-notification-example
              value: enabled                    
      actions:
        - if: '({{ alert.status }} == "firing" and (keep.is_business_hours(timezone="America/New_York", business_days=[6], start_hour=20, end_hour=23) or keep.is_business_hours(timezone="America/New_York", business_days=[0], start_hour=0, end_hour=1)))'
          name: time-based-notification
          provider:
            type: slack
            config: '{{ providers.slack_webhook }}'
            with:
              message: "Time-sensitive alert: {{ alert.alertname }}"
    Name: prod-critical-pagerduty
    Scope: env:prod AND severity:S1
    
    Rules:
    - When Firing → PagerDuty On-Call
    - When Firing or Resolved → #critical-alerts (Slack)
    Name: platform-team-alerts
    Scope: team:platform
    
    Rules:
    - When Firing → #platform-alerts (Slack)
    - When Resolved → #platform-alerts (Slack)
    Name: backend-team-alerts
    Scope: team:backend
    
    Rules:
    - When Firing → #backend-alerts (Slack)
    - When Resolved → #backend-alerts (Slack)
    Name: dev-notifications
    Scope: env:dev
    
    Rules:
    - When Firing → #dev-alerts (Slack)
    kubectl get svc -n groundcover | grep "victoria-metrics"
    # Identify the victoria-metrics service object name
    kubectl port-forward svc/{victoria-metrics-service-object-name} \
    -n groundcover 8428:8428
    ./vmbackup -credsFilePath={aws credentials path} \
    -storageDataPath=</path/to/victoria-metrics-data> \
    -snapshot.createURL=http://localhost:8428/snapshot/create \
    -dst=s3://<bucket>/<path/to/backup>
    kubectl scale sts {release name}-victoria-metrics --replicas=0
    kubectl get pvc -n groundcover | grep victoria-metrics
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: vm-restore
      annotations:
        eks.amazonaws.com/role-arn: XXXXX # role with permissions to write to the bucket
    ---
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: vm-restore
    spec:
      ttlSecondsAfterFinished: 600
      template:
        spec:
          serviceAccountName: vm-restore
          restartPolicy: OnFailure
          volumes:
          - name: vmstorage-volume
            persistentVolumeClaim:
              claimName: "{VICTORIA METRICS PVC NAME}"
          containers:
          - name: vm-restore
            image: victoriametrics/vmrestore
            imagePullPolicy: IfNotPresent
            volumeMounts:
            - mountPath: /storage
              name: vmstorage-volume
            command:
            - /bin/sh
            - -c
            - /vmrestore-prod -src=s3://<bucket>/<path/to/backup> -storageDataPath=/storage
    kubectl apply -f vm-restore.yaml -n groundcover
    kubectl scale sts {release name}-victoria-metrics --replicas=1
    global:
      backend:
        enabled: false
      ingress:
        site: {BYOC_ENDPOINT}
    
    clusterId: "your-cluster-name" # CLI will automatically detect cluster name
    env: "your-environment-name" # Add it to differnciate between different environments
    sh -c "$(curl -fsSL https://groundcover.com/install.sh)"
    groundcover deploy -f values.yaml
    sh -c "$(curl -fsSL https://groundcover.com/install.sh)"
    groundcover auth get-ingestion-key sensor
    global:
      groundcover_token: {sensor_key}
      backend:
        enabled: false
      ingress:
        site: {BYOC_ENDPOINT}
        
    clusterId: "your-cluster-name"
    env: "your-environment-name"
    # Add groundcover Helm repository and fetch latest chart
    helm repo add groundcover https://helm.groundcover.com && helm repo update groundcover
    helm upgrade \
        groundcover \
        groundcover/groundcover \
        -i \
        --create-namespace \
        -n groundcover \
        -f values.yaml
    helm repo update groundcover && helm upgrade \
        groundcover \
        groundcover/groundcover \
        -n groundcover \
        -f values.yaml
    groundcover delete
    helm uninstall groundcover -n groundcover
    # delete the namespace in order to remove the PVCs as well
    kubectl delete ns groundcover
    Source: Checks run from within your backend, when having multiple groundcover backends you can select the specific backend to use. We will support region selection for running tests from specific locations.
  • Supported Protocols: Currently, Synthetics supports HTTP/HTTPS tests. Support for additional protocols, including gRPC, ICMP (Ping), DNS, and dedicated SSL monitors are coming soon.

  • Alerting: Creating a Synthetic test automatically creates a corresponding Monitor (See: Monitors). Using monitors you can get alerted on failed synthetic tests, see: Notification Channels . The monitor is uneditable.

  • Trace Integration: We generate traces for all synthetic tests, which you can see as first-class citizens in groundcover platform. You can query these traces by using source:synthetics in traces page.

  • hashtag
    Creating a Synthetic Test

    Navigate to Monitors > Synthetics and click + Create Synthetic Test .

    circle-info

    Only Editors can create/edit/delete synthetic tests, see Default Policies

    hashtag
    Request Configuration

    Define the endpoint and parameters for the test.

    • Synthetic Test Name: A descriptive name for the test.

    • Target: Select the method (GET, POST, etc.) and URL. Include HTTP scheme as well, for example: https://api.groundcover.com/api/backends/list

      • Tip: Use Import from cURL to paste a command and auto-fill these fields.

    • HTTP Settings

      • Follow redirects: Should the test follow 3xx responses, when disabled the test will return the 3xx response as the result set for assertions.

      • Allow insecure: Disables SSL/TLS certificate verification. Use this only for internal testing or self-signed certificates. Not recommended for production endpoints as it exposes you to Man-in-the-Middle attacks.

    • Timing

      • Interval: Frequency of the check (e.g., every 60s).

      • Timeout: Max duration to wait before marking the test as failed. Timeout must be less than interval.

    • Payload: Select the body type if your request requires data (e.g., POST/PUT).

      • Options: None, JSON, Text, Raw.

    • Headers & Auth:

      • Authentication (Bearer tokens, API keys) will be released soon.

      • Headers: You can add custom headers passing key and values.

    hashtag
    Assertions (Validation Logic)

    Assertions are the rules that determine if a test passed or failed. You can add multiple assertions to a single test. If any assertion fails, the entire check is marked as failed.

    Available Assertion Fields

    The "Field" determines which part of the response groundcover inspects.

    Field

    Description

    statusCode

    Checks the HTTP response code (e.g., 200, 404, 500).

    responseHeader

    Checks for the presence or value of a specific response header (e.g., Content-Type).

    jsonBody

    Inspects specific keys or values within a JSON response payload.

    body

    Checks the raw text body of the response.

    responseTime

    Checks the response time of the response

    jsonBody

    Inspects specific keys or values within a JSON response payload.

    Assertion Operators

    The "Operator" defines the logic applied to the Field.

    Operator

    Function

    Example Use Case

    is equal to

    Exact match. Case-sensitive.

    statusCode is equal to 200

    is not equal to

    Ensures a value does not appear.

    statusCode is not equal to 500

    contains

    Checks if a substring exists within the target.

    body contains "error"

    starts with

    Verifies the beginning of a string.

    "status"

    hashtag
    Custom Labels

    Add custom labels, these labels will exist on traces generated by checks. You can use these labels to filter traces.

    hashtag
    Auto-Generated Monitors

    When you create a Synthetic Test, groundcover eliminates the need to manually configure separate alert rules. A Monitor is automatically generated and permanently bound to your test. See: Monitors .

    • Managed Logic: The monitor's threshold and conditions are derived directly from your Synthetic Test's assertions. If the test fails (e.g., status code != 200), the Monitor automatically enters a "Failing" state.

    • Lifecycle: This monitor handles the lifecycle of the alert, transitioning between Pending, Firing (when the test fails), and Resolved (when the test passes).

    • Zero Maintenance: You do not need to edit this monitor's query. Any changes you make to the Synthetic Test (such as changing the target URL or assertions) are automatically synced to the Monitor.

    Note: To prevent configuration drift, these auto-generated monitors are read-only. You cannot edit their query logic directly; you simply edit the Synthetic Test itself.

    Navigate to the Dashboard List and click on the Create New Dashboard button.

  • Provide an indicative name for your dashboard and, optionally, a description.

  • hashtag
    Steps to creating a Dashboard

    1. Create a new widget

    2. Choose a Widget Type

    3. Select a Widget Mode

    4. Build your query

    5. Choose a Display Type

    6. Save the widget

    Optional:

    1. Add variables

    2. Apply variable(s) to the widget

    hashtag
    Create a new Widget

    Widgets can be added by clicking on the Create New Widget button.

    hashtag
    Choose a Widget Type

    Widgets are the main building blocks of dashboards. groundcover supports the following widget types:

    • Chart Widget: Visualize your data through various display types.

    • Textual Widget: Add context to your dashboard, such as headers or instructions for issue investigations.

    circle-info

    Since selecting a Textual Widget is the last step for this type of widget, the rest of this guide is relevant only to Chart Widgets.

    hashtag
    Select a Widget Mode

    • Metrics: Work with all your available metrics for advanced use cases and custom metrics.

    • Infra Metrics: Use expert-built, predefined queries for common infrastructure scenarios. Ideal for quick starts.

    • Logs: Query and visualize log data.

    • Traces: Query and visualize trace data similar to logs.

    hashtag
    Build your query

    Once the Widget Mode selected, build your query for the visualization.

    circle-info

    If you're unfamiliar with query building in groundcover, refer to the Query Builder section for full details on the different components.

    hashtag
    Choose a Display Type

    Type
    Configuration options
    Supported modes

    Time Series

    Choose a Y-axis unit from the predefined list.

    Select a visualization type: Stacked Bar or Line Chart.

    Metrics

    Infra Metrics

    Logs

    Traces

    Table

    Define columns based on data fields or metrics.

    Choose a Y-axis unit from the predefined list.

    Metrics

    Infra Metrics

    Logs

    Traces

    Stat

    Select a Y-axis unit from the predefined list.

    Metrics

    Infra Metrics

    Logs

    Traces

    Top List

    Choose a ranking metric and sort order.

    hashtag
    Variables

    Variables dynamically filter your entire dashboard or specific widgets with just one click. They consist of a key-value pair that you define once and reuse across multiple widgets.

    Our predefined variables cover most use cases, but if you’re missing an important one, let us know. Advanced variables are also on our roadmap.

    hashtag
    Adding a Variable

    1. Click on Add Variable and configure the variable using the following fields.

    2. The data source to be used:

      1. Suggested Variables - A predefined list of popular variables which use 'All Datasources' behind the scenes.

      2. All Datsources - Will show values for the chosen key from logs, traces, all metrics, and events.

      3. Traces, Logs, Events - Only show values for the chosen key for data coming from these sources.

      4. Metrics - Show values for a chosen label key from a chosen metric name.

    3. Choose the label key to be used to fetch values from the data source selected.

    4. Choose the name of the variable to be used in the widgets with $ as explained below.

    hashtag
    Using a Variable

    Variables can be referenced in the Filter Bar of the Widget Creation Modal using their name.

    1. In this following example we selected Clusters from the predefined list, and named it 'clusters'.

    2. While creating or editing a Chart Widget, add a reference to the variable using a dollar sign in the filter bar, (for example, $clusters).

    3. The data will automatically filter by the variable's key with the selected values. If all values are selected, the filter will be followed by an asterisk (for example, cluster:.*)

    1. After configuring the Variable in the widget queries, you may select the values you which to filter and choose the default to be used when the dashboard loads on first time.

    2. After selecting values in at least one Variable, all other relevant Variables will render an 'Associated Values' section in the dropdown list. This list renders the values of the selected variable's key which are associated with the values of the currently selected variables' keys.

      1. For example- selecting the value production in a variable called cluster which uses the key cluster from All Datasources, then when going to workloads variable I can see in the 'Associated Values' section a list of workload values that are in the production cluster.

      2. Below in the 'Additional Values' will be shown all other values.

      3. The association is done by relevant data types only, if you are getting unexpected associated results you may be advised to narrow down the data sources that the variable uses from 'All Datasources' to a specific type or a specific metric.

      4. Limitations and tips-

        1. It's possible that there are associated values which don't appear in the list, this list is not hermetic, but anything associated is necessarily associated.

        2. Start to type the value you are searching to narrow down the list.

    Authentication

    This endpoint requires API Key authentication via the Authorization header.

    hashtag
    Headers

    hashtag
    Request Body

    Optional filters for ingestion keys:

    Parameter
    Type
    Required
    Description

    name

    string

    No

    Filter by exact key name

    type

    string

    No

    Filter by key type ("sensor", "thirdParty", "rum")

    remoteConfig

    boolean

    No

    hashtag
    Examples

    hashtag
    Get All Ingestion Keys

    hashtag
    Filter by Type

    Get only sensor keys:

    hashtag
    Filter by Name and Remote Config

    hashtag
    Response Example

    hashtag
    Response Schema

    Field
    Type
    Description

    id

    string

    Unique identifier for the ingestion key (UUID)

    name

    string

    Human-readable name for the key

    createdBy

    string

    Email of the user who created the key

    creationDate

    string

    ISO 8601 timestamp of key creation

    hashtag
    Related Documentation

    For comprehensive information about ingestion keys, including creation, usage, and best practices, see:

    • Ingestion Keysarrow-up-right

    ClickHouse within groundcover

    Log and Trace telemetry data is stored within a ClickHouse database.

    You can directly query this data using SQL statements and create powerful monitors.

    To create and test your SQL queries use the Grafana Explorearrow-up-right page within the groundcover app.

    Select the ClickHouse@groundcover datasource with the SQL Editor option to start crafting your SQL queries

    Start with show tables; to see of all the available tables to use for your queries: logs and traces would be popular choices (table names are case sensitive).

    hashtag
    Query best practices

    While testing your queries always use LIMIT to limit your results to a small set of data.

    To apply the Grafana timeframe on your queries make sure to add the following conditions:

    Logs: WHERE $__timeFilter(timestamp)

    Traces: WHERE $__timeFilter(start_timestamp)

    Note: When querying logs with SQL, it's crucial to use efficient filters to prevent timeouts and enhance performance. Implementing primary filters like cluster, workload, namespace, and env will significantly speed up queries. Always integrate these filters when writing your queries to avoid inefficient queries.

    hashtag
    Filtering on attributes and tags

    Traces and Logs have rich context that is normally stored in dedicated columns in json format. Accessing the context for filtering and retrieving values is a popular need when querying the data.

    To get to the relevant context item, either in the attributes or tags you can use the following syntax:

    WHERE string_attributes['host_name'] = 'my.host'

    WHERE string_tags['cloud.name'] = 'aws'

    WHERE float_attributes['hradline_count'] = 4

    WHERE float_tags['container.size'] = 22.4

    To use the float context ensure that the relevant attributes or tags are indeed numeric. To do that, check the relevant log in json format to see if the referenced field is not wrapped with quotes (for example, headline_count in the screenshot below)

    hashtag
    SQL Query structure for a monitor

    In order to be able to use an SQL query to create a monitor you must make sure the query returns no more than a single numeric field - this is the monitored field on which the threshold is placed.

    The query can also contain any number of "group by" fields that are passed to the monitor as context labels.

    Here is an exmaple of an SQL query that can be used for a monitor

    In this query the threshold field is a ratio between some value measured on the last week and in the last 10 minutes.

    tenantID and env are the group by labels that are passed to the monitor as context labels.


    Here is another query example (check the percentage of errors in a set of logs):

    A single numeric value is calculated and grouped by cluster, namespace and workload

    hashtag
    Applying the SQL query as a monitor

    Applying an SQL query can only happens in YAML mode. You can use the following YAML template to add your query

    1. Give your monitor a name and a description

    2. Paste your SQL query in the expression field

    3. Set the threshold value and the relevant operator - in this example this is "lower than" 0.5 (< 0.5)

    4. Set your workflow name in the annotations section

    5. Set the check interval and the pending time

    6. Save the monitor

    hashtag
    Trigger an alert when no logs are coming from a Linux host

    Use YAML mode to add the following template to your monitor.

    In this example we are creating a list of Linux hosts that were sending logs in the last 24 hours and then checking if there were any logs collected from those hosts in the last 5 minutes.

    This monitor can be used e.g. to catch when the host is down.

    circle-info

    It would be helpful if you add an indication on the monitor name that this is SQL based. For example, add an [SQL] prefix or suffix to the monitor name as shown in the example

    Setting up an alert in groundcover involves defining conditions based on data collected in the platform, such as metrics, traces, logs, or Kubernetes events. This guide will walk you through the process of creating an alert based on metrics. More guides will follow to include all different types of data.

    hashtag
    Step 1: Access the Alerts section

    1. Log inarrow-up-right to groundcover and navigate to the Alerts section by clicking on it on the left navigation menu.

    2. Once in the Alerts section, click on Alerting in the inner menu on the left.

      • If you can't see the inner menu, click on the 3 bars next to "Home" in the upper left corner.

    3. Click on Alert Rules

    4. Then click on the blue "+ New alert rule" button in the upper right.

    hashtag
    Step 2: Give the alert a name and define query and conditions

    1. Type a name for your alert. It's recommended to use a name that will make it easy for you to understand its function later.

    2. Select the data source:

      1. ClickHouse: For alerts based on your traces, logs, and Kubernetes events.

      2. Prometheus: For alerts based on metrics (includes APM metrics, infrastructure metrics, and custom metrics from your environment)

    3. Click on "Select metric"

      • Note: Make sure you are in "Builder" view (see screenshot) to see this option.

    4. Click on "Metrics explorer"

    5. Start typing the name of the metric you want this alert to be based on. Note that the Metrics explorer will start displaying matches as you type, so you can find your metric even if you don't remember it exact name. You can also check out our list of .

    6. Once you see your metric in the list, click on "Select" in that row.

    Note: You can click on "Run queries" to see the results of this query.

    hashtag
    Step 3: Define expressions - Reduce & Threshold

    1. In the Reduce section, open on the "Function" dropdown menu and choose the type of value you want to use.

      • Min - the lowest value

      • Max - the highest value

      • Mean - the average of the values

      • Sum - the sum of all values

      • Count - the number of values in the result

      • Last - the last value

    2. In the Threshold section, type a value and choose whether you want the alert to fire when the query result is above or below that value. You can also select a range of values.

    hashtag
    Step 4: Set evaluation behavior

    1. Click on "+ New folder" and type a name for the folder in which this rule will be stored. You can choose any name, but it's recommended to use a name that will make it easy for you to find the relevant evaluation groups, should you want to use them again in future alerts.

    2. Click on "+ New evaluation group" and type a name for this evaluation group. The same recommendation applies here too.

      In the Evaluation interval textbox, type how often the rule should be evaluated to see if it matches the conditions set in Step 3. Then, click "Create". Note: For the Evaluation interval, use the format (number)(unit), where units are:

      • s = seconds

      • m = minutes

      • h = hours

      • d = days

      • w = weeks\

    3. In the Pending period box, type how often you want the alert to match the conditions before it fires.

    circle-info

    Evaluation interval = how often do you want to check if the alert should fire

    Pending period = how long do you want this to be true before it fires

    As an example, you can define the alert to fire only if the Mean percentage of memory used by a node is above 90% in the past 2 minutes (Pending period = 2m) and you want to check if that's true every 30 seconds (Evaluation interval = 30s).

    hashtag
    Step 5: Choose contact point

    If you already have a contact point set up, simply select it from the dropdown menu at the bottom of the "Configure lables and notifications" section. If not, click on the blue "View or create contact points" link, which will open a new tab.

    Click on the blue "Add contact point" button

    This will get you to the Contact points screen. Then:

    1. Type a name for the contact point

    2. From the dropdown menu, choose which system you want to use to push the alert to.

    3. The information required to push the alert will change based on the system you select. Follow on-screen instructions (for example, if email is selected, you'll need to enter the email address(es) for that contact.

    4. Click "Save contact point"

    You can now close this tab to go back to the alert rule screen.

    Next to the link you clicked to create this new contact point, you'll find a dropdown menu, where you can select the contact point you just created.

    hashtag
    Step 6: Add annotations

    Under "Add annotations", you have two free text boxes that give you the option to add any information that can be useful to you and/or the recipient(s) of this alert, such as a summary that reminds you of the alert's functionality or purpose, or next step instructions when this alert fires.

    hashtag
    Step 7: Save and exit

    Once all of it is ready, you can click the blue "Save rule and exit" button on the upper right of the screen, which will bring you back to the Alert rules screen. You will now be able to see your alert, as well as its status - normal (green), pending (yellow), or firing (red), as well as the Evaluation interval (blue).

    hashtag
    Configuring Alert from existing dashboard:

    1. Log in to your groundcover account and navigate to the dashboard that you want to create an alert from.

    2. Locate the Grafana panel that you want to create an alert from and click on the panel's header and select edit .

    3. Click on the alert tab as seen in the image below. Select the Manage alerts option from the dropdown menu.

    4. Click on the New Alert Rule button.

    circle-info

    Note: only time series panels support alert creation.

    1. An alert is derived from three parts that will be configured in the screen that you are navigated to:

      • Expression - the query that defines the alert input itself,

      • Reduction - the value that should be leveraged from the aforementioned expression

      • Threshold - value to measure against said reduciton output to see if an alert should be triggered

    2. Verify expression value and enter reduction and threshold values in line with your alerting expectation

    3. Select folder - if needed you can navigate to dashboard tab in left nav and create new folder

    4. Select evaluation ground or type text in order to create a new group as shown below

    5. Click "Save and Exit" on top right hand side of screen to create alert

    6. Ensure your notification is configured to have alerts sent to end users. See "Configuring Slack Contact Point" section below if needed.

    circle-exclamation

    Note: Make sure to test the alert to ensure that it is working as expected. You can do this by triggering the conditions that you defined and verifying that the alert is sent to the specified notification channels.

    hashtag

    this guide
    hashtag
    Supported Resources
    • groundcover_policy – Defines RBAC policies (roles and optional data scope filters) Role-Based Access Control (RBAC)

    • groundcover_serviceaccount – Creates service accounts using attaches policies. Service Accounts

    • groundcover_apikey – Creates API keys for service accounts.

    • groundcover_monitor – Defines alerting rules and monitors.

    • groundcover_logspipeline - Defines Logs Pipeline configurations

    • groundcover_ingestionkey - Creates Ingestion keys.

    • groundcover_dashboard - Define Dashboards.

    hashtag
    Installation and Setup

    hashtag
    Requirements

    • Terraformarrow-up-right ≥ 1.0 (Check required_version if specified in main.tf)

    • Goarrow-up-right >= 1.21 (to build the provider plugin)

    • groundcover Account and API Key.

    hashtag
    Install the Provider

    Run terraform init to install the provider.

    hashtag
    Configure the Provider

    hashtag
    Arguments

    • api_key (String, Required, Sensitive): Your groundcover API key. It is strongly recommended to configure this using the TF_VAR_groundcover_api_key environment variable rather than hardcoding it.

    • base_url (String, Optional): The base URL for the groundcover API. Defaults to https://api.groundcover.com if not specified.

    • backend_id (String, Required): Your Backend ID can be found in the API Keys screen in the groundcover UI (Under Settings -> Access):

    hashtag
    Examples

    For full examples of all existing resources, see: https://github.com/groundcover-com/terraform-provider-groundcover/tree/main/examples/resourcesarrow-up-right

    hashtag
    Creating a Read-Only Service Account and API Key

    https://registry.terraform.io/providers/groundcover-com/groundcover/latestarrow-up-right
    https://github.com/groundcover-com/terraform-provider-groundcoverarrow-up-right

    hashtag
    Saving a view

    1. Configure the page until it looks exactly right—filters, columns, panels, etc.

    2. Click ➕Save View.

    3. Give the view a clear Name.

    4. Hit Save. The view is now listed and available to everyone in the project.

    Scope – Saved Views are per‑page. A Logs view appears only in Logs; a Traces view only in Traces.


    hashtag
    What a view stores

    hashtag
    Common to all pages

    Category
    Details

    Filters & facets

    All query filters plus facet open/closed state

    Columns

    Chosen columns, and their order, sort, width

    Filter Panel & Added Facets

    Filter panel open/closed

    Facets added/removed

    hashtag
    Page‑specific additions

    Page
    Extra properties saved

    Logs

    • logs / patterns

    • textWrap

    • Insight

    Traces

    • traces / span

    • table / drilldown

    API Catalog

    • protocol

    • Kafka role: Fetcher / Producer

    Events

    • textWrap


    hashtag
    Updating a view

    The Update View button appears only when you are the creator of the view. Click it to overwrite the view with your latest changes.

    Underneath every View you can see which user created it.


    hashtag
    Managing views (row operations)

    Action
    Who can do it?

    Edit / Rename

    Creator

    Delete

    Creator

    Star / Unstar

    Any user for themselves


    hashtag
    Searching, Sorting, and Filtering the list

    Searching the Views will look up based on View names and the creators.

    The default sorting pins the favorites views at the top, and the rest of the views below. Each group of views is sorted from A→Z.

    In addition, 3 filtering options are available:

    1. All Views - The entire workspace's views for a specific page

    2. My Favorites – The favorite views of the user for a specific page

    3. Created By Me - The views created by the user

    hashtag
    Policies

    Policies are the foundational elements of groundcover’s RBAC. Each policy defines:

    1. A permission level – which actions the user can perform (Admin, Editor, or Viewer-like capabilities).

    2. A data scope – which clusters, environments, or namespaces the user can see.

    By assigning one or more policies to a user, you can precisely control both what they can do and where they can do it.

    hashtag
    Default Policies

    groundcover provides three default policies to simplify common use cases:

    1. Default Admin Policy

      • Permission: Admin

      • Data Scope: Full (no restrictions)

      • Behavior: Unlimited access to groundcover features and configurations.

    2. Default Editor Policy

      • Permission: Editor

      • Data Scope: Full (no restrictions)

    3. Default Viewer Policy

      • Permission: Viewer

      • Data Scope: Full (no restrictions)

    These default policies allow you to quickly onboard new users with typical Admin/Editor/Viewer capabilities. However, you can also create custom policies with narrower data scopes, if needed.

    hashtag
    Policy Structure

    A policy’s data scope can be defined in two modes: Simple or Advanced.

    1. Simple Mode

      • Uses AND logic across the specified conditions.

      • Applies the same scope to all entity types (e.g., logs, traces, events, workloads).

      • Example: “Cluster = Dev AND Environment = QA,” restricting all logs, traces, events, etc. to the Dev cluster and QA environment.

    2. Advanced Mode

      • Lets you define different scopes for each data entity (logs, traces, events, workloads, etc.).

      • Each scope can use OR logic among conditions, allowing more fine-grained control.

    When creating or editing a policy, you select permission (Admin, Editor, or Viewer) and a data scope mode (Simple or Advanced).

    hashtag
    Multiple Policies

    A user can be associated with multiple policies. When that occurs:

    1. Permission Merging

      • The user’s final permission level is the highest among all assigned policies.

      • Example: If one policy grants Editor and another grants Viewer, the user is effectively an Editor overall.

    2. Data Scope Merging

      • Data scopes merge via OR logic, broadening the user's overall data access.

      • Example: Policy A => "Cluster = A," Policy B => "Environment = B," so final scope is "Cluster A OR Environment B."

    A user may be assigned a policy granting the Editor role with a data scope relevant to specific clusters, and simultaneously be assigned another policy granting the Viewer role with a different data scope. The user's effective access is determined by the highest role across all assigned policies and by the union (OR) of scopes.


    In summary:

    • Policies define both permission (Admin, Editor, or Viewer) and data scope (clusters, environments, namespaces).

    • Default Policies (Admin, Editor, Viewer) provide no data restrictions, suitable for quick onboarding.

    • Custom Policies allow more granular restrictions, specifying exactly which entities a user can see or modify.

    • Multiple Policies can co-exist, merging permission levels and data scopes via OR logic across all data types.

    This flexible system gives you robust control over observability data in groundcover, ensuring each user has precisely the access they need.

    Enterprise planarrow-up-right

    Pagination

    This document explains how to paginate through large log result sets using the groundcover logs query API.

    hashtag
    Overview

    When querying logs, you may receive more results than can be efficiently returned in a single response. Pagination allows you to retrieve results in smaller, manageable chunks by using the limit and skip parameters.

    hashtag
    Parameters

    Parameter
    Type
    Description
    Example
    circle-exclamation

    Note: The limit parameter can get a maximum value of 5000

    hashtag
    Basic Pagination

    To paginate through results:

    1. First page: Set skip: 0 and limit to your desired page size

    2. Subsequent pages: Increment skip by the limit value for each page

    Example: First Page

    Example: Second Page

    Example: Third Page

    hashtag
    Pagination Formula

    To calculate the skip value for any page:

    Where:

    • page_number is the page you want to retrieve (1, 2, 3, ...)

    • limit is your page size

    Examples:

    • Page 1 with limit 200: skip = (1 - 1) × 200 = 0

    • Page 2 with limit 200: skip = (2 - 1) × 200 = 200

    • Page 3 with limit 200: skip = (3 - 1) × 200 = 400

    hashtag
    Determining When to Stop

    The response includes a limitReached field that indicates whether the result limit was reached:

    • If limitReached: true, there may be more results available. Continue to the next page.

    • If limitReached: false and you received fewer results than your limit, you've reached the end of the results.

    hashtag
    Best Practices

    1. Consistent Page Size: Use the same limit value for all pages in a pagination sequence to ensure consistent results.

    2. Reasonable Limits: Choose a limit value that balances performance and usability. Common values are 50, 100, or 200.

    3. Sort Order: Always use the same sortBy

    hashtag
    Example: Pagination Loop

    Here's a pseudo-code example of how to implement pagination:

    hashtag
    Pagination with Filtering

    When using pagination with filtered queries (using the group parameter), ensure you use the same group conditions across all pages to maintain consistent filtering:

    Important: Keep the same group, start, end, sortBy, and sortOrder values across all pages to ensure consistent pagination through the same filtered dataset.

    List Workflows

    Get a list of all configured alert workflows with their complete definitions, provider integrations, execution status, and YAML configurations.

    hashtag
    Endpoint

    POST /api/workflows/list

    hashtag
    Authentication

    This endpoint requires API Key authentication via the Authorization header.

    hashtag
    Headers

    Header
    Required
    Description

    hashtag
    Request Body

    This endpoint does not require a request body for the POST method.

    Field Descriptions

    Field
    Type
    Description

    hashtag
    Examples

    hashtag
    Basic Request

    Get all workflows:

    hashtag
    Response Example

    Integration Examples with Workflows

    This page provides examples of how to integrate workflows with different notification systems and external services.

    hashtag
    Slack Notification

    This workflow sends a simple Slack message when triggered:

    hashtag
    Slack with Rich Formatting

    This workflow sends a formatted Slack message using Block Kit:

    hashtag
    PagerDuty Integration

    This workflow creates a PagerDuty incident:

    hashtag
    Opsgenie Integration

    This workflow creates an Opsgenie alert:

    • Alias is used to group identical events together in Opsgenie (alias key in the payload)

    • Severities must be mapped to Opsgenie valid severities (priority key in the payload)

    • Tags are a list of string values (tags key in the payload)

    hashtag
    Jira Ticket Creation

    This workflow creates a Jira ticket using webhook integration:

    hashtag
    Multiple Actions

    This workflow performs multiple actions for the same alert:

    Application Metrics

    hashtag
    Our metrics philosophy

    The groundcover platform generates 100% of its metrics from the actual data. There are no sample rates or complex interpolations to make up for partial coverage. Our measurements represent the real, complete flow of data in your environment.

    allows us to construct the majority of the metrics on the very node where the raw transactions are recorded. This means the raw data is turned into numbers the moment it becomes possible - removing the need for storing or sending it elsewhere.

    Metrics are stored in groundcover's

    Authorization: Bearer <YOUR_API_KEY>
    Content-Type: application/json
    {
      "name": "string",
      "type": "sensor|thirdParty|rum",
      "remoteConfig": boolean
    }
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/rbac/ingestion-keys/list' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --data '{}'
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/rbac/ingestion-keys/list' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --data '{
        "type": "sensor"
      }'
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/rbac/ingestion-keys/list' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --data '{
        "name": "my-sensor-key",
        "remoteConfig": true
      }'
    [
      {
        "id": "12345678-1234-1234-1234-123456789abc",
        "name": "production-sensor-key",
        "createdBy": "[email protected]",
        "creationDate": "2025-08-31T11:48:18Z",
        "key": "gcik_AEBAAAD4_XXXXXXXXX_XXXXXXXXX_XXXXXXXX",
        "type": "sensor",
        "remoteConfig": true,
        "tags": []
      },
      {
        "id": "87654321-4321-4321-4321-987654321def",
        "name": "my-sensor-key",
        "createdBy": "[email protected]",
        "creationDate": "2025-08-31T11:48:18Z",
        "key": "gcik_AEBAAAC7_XXXXXXXXX_XXXXXXXXX_XXXXXXXX",
        "type": "sensor",
        "remoteConfig": true,
        "tags": []
      },
      {
        "id": "abcdefab-cdef-abcd-efab-cdefabcdefab",
        "name": "third-party-integration",
        "createdBy": "[email protected]",
        "creationDate": "2025-08-31T11:48:18Z",
        "key": "gcik_AEBAAAHP_XXXXXXXXX_XXXXXXXXX_XXXXXXXX",
        "type": "thirdParty",
        "remoteConfig": false,
        "tags": []
      }
    ]
    SELECT * FROM logs LIMIT 10;
    with engineStatusLastWeek as (
      select string_attributes['tenantID'] tenantID, , string_attributes['env'] env, max(float_attributes['engineStatus.numCylinders']) cylinders
      from logs
      where timestamp >= now() - interval 7 days
        and workload = 'engine-processing'
        and string_attributes['tenantID'] != ''
      group by tenantID, env
    ),
    engineStatusNow as (
      select string_attributes['tenantID'] tenantID, string_attributes['env'] env, min(float_attributes['engineStatus.numCylinders']) cylinders
      from logs
      where timestamp >= now() - interval 10 minutes
        and workload = 'engine-processing'
        and string_attributes['tenantID'] != ''
      group by tenantID, env
    )
    select n.tenantID, n.env, n.cylinders/lw.cylinders AS threshold
    from engineStatusNow n
    left join engineStatusLastWeek lw using (tenantID)
    where n.cylinders/lw.cylinders <= 0.5
    SELECT cluster, namespace, workload, 
        round( 100.0 * countIf(level = 'error') / 
        nullIf(count(), 0), 2 ) AS error_ratio_pct 
    FROM "groundcover"."logs" 
    WHERE timestamp >= now() - interval '10 minute' AND 
    namespace IN ('refurbished', 'interface') GROUP BY cluster, namespace, workload
    title: "[SQL] Monitor name"
    display:
      header: Monitor description
    severity: S2
    measurementType: event
    model:
      queries:
        - name: threshold_input_query
          expression: "[YOUR SQL QUERY GOES HERE]"
          datasourceType: clickhouse
          queryType: instant
      thresholds:
        - name: threshold_1
          inputName: threshold_input_query
          operator: lt
          values:
            - 0.5
    annotations:
      [Workflow Name]: enabled
    executionErrorState: OK
    noDataState: OK
    evaluationInterval:
      interval: 3m
      pendingFor: 2m
    isPaused: false
    
    title: Host not sending logs more than 5 minutes
    display:
      header: Host "{{host}}" is not sending logs for more than 5 minutes
    severity: S2
    measurementType: event
    model:
      queries:
        - name: threshold_input_query
          expression: "
          WITH
              (
              SELECT groupArray(DISTINCT host)
              FROM logs
              WHERE timestamp >= now() - INTERVAL 24 HOUR
              AND env_type = 'host'
              ) AS all_hosts
          SELECT
              host,
              coalesce(log_count, 0) AS log_count
          FROM
              (
              SELECT arrayJoin(all_hosts) AS host
              ) AS h
              LEFT JOIN
                  (
                  SELECT host, count(*) AS log_count
                  FROM logs
                  WHERE timestamp >= now() - INTERVAL 5 MINUTE
                  AND env_type = 'host'
                  GROUP BY host
                  ) AS l
          USING (host)
          ORDER BY host
          "
          datasourceType: clickhouse
          queryType: instant
      thresholds:
        - name: threshold_1
          inputName: threshold_input_query
          operator: lt
          values:
            - 10
    annotations:
      {Put Your Workflow Name Here}: enabled
    executionErrorState: Error
    noDataState: NoData
    evaluationInterval:
      interval: 5m
      pendingFor: 0s
    isPaused: false
    terraform {
      required_providers {
        groundcover = {
          source  = "registry.terraform.io/groundcover-com/groundcover"
          version = ">= 0.0.0" # Replace with actual version constraint
        }
      }
    }
    provider "groundcover" {
      api_key  = "YOUR_API_KEY" # Required
      base_url = "https://api.groundcover.com" # Optional, change if using onprem/airgap deployment
      backend_id = "groundcover" # Your Backend ID can be found in the groundcover UI under Settings->Access->API Keys
    }
    resource "groundcover_policy" "read_only" {
      name        = "Read-Only Policy"
      description = "Grants read-only access"
      claim_role  = "ci-readonly-role"
      roles = {
        read = "read"
      }
    }
    
    resource "groundcover_serviceaccount" "ci_account" {
      name         = "ci-pipeline-account"
      description  = "Service account for CI"
      policy_uuids = [groundcover_policy.read_only.id]
    }
    
    resource "groundcover_apikey" "ci_key" {
      name               = "CI Key"
      description        = "Key for CI pipeline"
      service_account_id = groundcover_serviceaccount.ci_account.id
    }
    workflow: 
      id: slack-notification
      description: Send Slack notification for alerts
      triggers:
        - type: alert
          filters:
            - key: annotations.slack-notification
              value: enabled                
      actions:
        - name: slack-notification
          provider:
            type: slack
            config: '{{ providers.slack_webhook }}'
            with:
              message: "Alert: {{ alert.alertname }} - Status: {{ alert.status }}"
    show / hide
    textWrap
  • Insight show / hide

  • HTTP Version: Select the protocol version: one of
    HTTP/1.0
    ,
    HTTP/1.1
    or
    HTTP/2.0
    . Default is
    HTTP/1.1

    ends with

    Verifies the end of a string.

    "success"

    matches regex

    Validates against a Regular Expression.

    jsonBody matches regex user_id: \d+

    exists

    Checks that a field or header is present, regardless of value.

    set-cookie exists in response headers

    does not exist

    Checks that a field is absent.

    jsonBody (error_message) does not exist

    is one of

    Checks against a list of acceptable values.

    statusCode is one of 200, 201, 202

    Filter by remote configuration status

    key

    string

    The actual ingestion key (starts with gcik_)

    type

    string

    Key type ("sensor", "thirdParty", "rum")

    remoteConfig

    boolean

    Whether remote configuration is enabled

    tags

    array

    Array of tags associated with the key

    It's possible that Additional Values will also relate to the chosen values of other variables.

    Logs

    Traces

    Pie

    Select a data source and aggregation method.

    Logs

    Traces

    API Keys
    Monitors
    https://github.com/groundcover-com/docs/blob/main/use-groundcover/broken-reference/README.md
    Ingestion Keys
    Dashboards

    Page 1 with limit 50: skip = (1 - 1) × 50 = 0

  • Page 5 with limit 50: skip = (5 - 1) × 50 = 200

  • and
    sortOrder
    parameters across all pages to maintain consistent ordering.
  • Time Range: Keep the same start and end time range across all pages to ensure you're paginating through the same dataset.

  • Filtering: If using the group parameter for filtering, use the same group conditions across all pages to maintain consistent filtering.

  • Error Handling: Implement retry logic and error handling for pagination requests, as network issues can occur between pages.

  • Check limitReached: Use the limitReached field in the response to determine if more pages are available.

  • limit

    integer

    Maximum number of log entries to return per page. The maximum allowed value is: 5000

    200

    skip

    integer

    Number of log entries to skip from the start

    0

    created_by

    string

    Email of the workflow creator

    creation_time

    string

    Workflow creation timestamp (ISO 8601)

    triggers

    array

    Array of trigger configurations

    triggers[].type

    string

    Trigger type (e.g., "alert")

    interval

    number

    Execution interval (typically 0 for alert-triggered workflows)

    last_execution_time

    string/null

    Last execution timestamp

    last_execution_status

    string/null

    Last execution status ("success", "error", etc.)

    providers

    array

    Array of integration provider configurations

    providers[].type

    string

    Provider type (see provider types below)

    providers[].id

    string/null

    Provider configuration ID

    providers[].name

    string

    Provider display name

    providers[].installed

    boolean

    Whether provider is installed and configured

    workflow_raw_id

    string

    Raw workflow identifier

    workflow_raw

    string

    Complete YAML workflow definition

    revision

    number

    Workflow version number

    last_updated

    string

    Last update timestamp (ISO 8601)

    invalid

    boolean

    Whether workflow configuration is invalid

    last_execution_started

    string/null

    When last execution started

    Authorization

    Yes

    Bearer token with your API key

    Content-Type

    Yes

    Must be application/json

    Accept

    Yes

    Must be application/json

    workflows

    array

    Array of workflow objects

    id

    string

    Unique workflow identifier (UUID)

    name

    string

    Workflow name

    description

    string

    Workflow description

    curl 'https://app.groundcover.com/api/logs/v2/query' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'Content-Type: application/json' \
      --data-raw '{
        "start": "2025-12-24T07:54:29.459Z",
        "end": "2025-12-24T13:54:29.459Z",
        "group": null,
        "limit": 200,
        "skip": 0,
        "sortBy": "timestamp",
        "sortOrder": "desc",
        "selectors": [],
        "optimizeSearch": true
      }'
    curl 'https://app.groundcover.com/api/logs/v2/query' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'Content-Type: application/json' \
      --data-raw '{
        "start": "2025-12-24T07:54:29.459Z",
        "end": "2025-12-24T13:54:29.459Z",
        "group": null,
        "limit": 200,
        "skip": 200,
        "sortBy": "timestamp",
        "sortOrder": "desc",
        "selectors": [],
        "optimizeSearch": true
      }'
    curl 'https://app.groundcover.com/api/logs/v2/query' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'Content-Type: application/json' \
      --data-raw '{
        "start": "2025-12-24T07:54:29.459Z",
        "end": "2025-12-24T13:54:29.459Z",
        "group": null,
        "limit": 200,
        "skip": 400,
        "sortBy": "timestamp",
        "sortOrder": "desc",
        "selectors": [],
        "optimizeSearch": true
      }'
    skip = (page_number - 1) × limit
    {
      "logs": [...],
      "limitReached": true,
      "done": true,
      ...
    }
    page_size = 200
    skip = 0
    all_logs = []
    
    while True:
        response = query_logs(limit=page_size, skip=skip)
        logs = response['logs']
        all_logs.extend(logs)
        
        # Check if we've reached the end
        if not response['limitReached'] or len(logs) < page_size:
            break
        
        # Move to next page
        skip += page_size
    # First page with filtering
    curl 'https://app.groundcover.com/api/logs/v2/query' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'Content-Type: application/json' \
      --data-raw '{
        "start": "2025-12-24T12:06:54.169Z",
        "end": "2025-12-24T18:06:54.169Z",
        "group": {
          "operator": "and",
          "conditions": [{
            "filters": [{"op": "match", "value": "error"}],
            "key": "level",
            "origin": "root",
            "type": "string"
          }]
        },
        "limit": 200,
        "skip": 0,
        "sortBy": "timestamp",
        "sortOrder": "desc",
        "selectors": [],
        "optimizeSearch": true
      }'
    
    # Second page - same group conditions, different skip value
    curl 'https://app.groundcover.com/api/logs/v2/query' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'Content-Type: application/json' \
      --data-raw '{
        "start": "2025-12-24T12:06:54.169Z",
        "end": "2025-12-24T18:06:54.169Z",
        "group": {
          "operator": "and",
          "conditions": [{
            "filters": [{"op": "match", "value": "error"}],
            "key": "level",
            "origin": "root",
            "type": "string"
          }]
        },
        "limit": 200,
        "skip": 200,
        "sortBy": "timestamp",
        "sortOrder": "desc",
        "selectors": [],
        "optimizeSearch": true
      }'
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/workflows/list' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Accept: */*'
    {
      "workflows": [
        {
          "id": "12345678-1234-1234-1234-123456789abc",
          "name": "ms-teams-alerts-workflow",
          "description": "Sends an API to MS Teams alerts endpoint",
          "created_by": "[email protected]",
          "creation_time": "2025-07-02T09:42:13.334103Z",
          "triggers": [
            {
              "type": "alert"
            }
          ],
          "interval": 0,
          "last_execution_time": null,
          "last_execution_status": null,
          "providers": [
            {
              "type": "webhook",
              "id": "provider123456789abcdef",
              "name": "teams-integration",
              "installed": true
            },
            {
              "type": "webhook",
              "id": null,
              "name": "backup-teams-integration",
              "installed": false
            }
          ],
          "workflow_raw_id": "teams-webhook",
          "workflow_raw": "id: teams-webhook\ndescription: Sends an API to MS Teams alerts endpoint\ntriggers:\n- type: alert\n  filters:\n  - key: annotations.ms-teams-alerts-workflow\n    value: enabled\nname: ms-teams-alerts-workflow\n...",
          "revision": 11,
          "last_updated": "2025-07-03T08:57:09.881806Z",
          "invalid": false,
          "last_execution_started": null
        },
        {
          "id": "87654321-4321-4321-4321-987654321def",
          "name": "webhook-alerts-workflow",
          "description": "Workflow for sending alerts to custom webhook",
          "created_by": "[email protected]",
          "creation_time": "2025-06-19T12:49:37.630392Z",
          "triggers": [
            {
              "type": "alert"
            }
          ],
          "interval": 0,
          "last_execution_time": null,
          "last_execution_status": null,
          "providers": [
            {
              "type": "webhook",
              "id": "webhook987654321fedcba",
              "name": "custom-webhook",
              "installed": true
            }
          ],
          "workflow_raw_id": "webhook-alerts",
          "workflow_raw": "id: webhook-alerts\ndescription: Workflow for sending alerts to custom webhook\n...",
          "revision": 2,
          "last_updated": "2025-06-19T12:51:24.643393Z",
          "invalid": false,
          "last_execution_started": null
        }
      ]
    }
    workflow: 
      id: slack-rich-notification
      description: Send formatted Slack notification
      triggers:
        - type: alert
          filters:
            - key: annotations.slack-rich-notification
              value: enabled                
      actions:
        - name: slack-rich-notification
          provider:
            type: slack
            config: '{{ providers.slack_webhook }}'
            with:
              blocks:
              - type: header
                text:
                  type: plain_text
                  text: ':rotating_light: {{ alert.alertname }} :rotating_light:'
                  emoji: true
              - type: divider
              - type: section
                fields:
                - type: mrkdwn
                  text: |-
                    *Cluster:*
                    {{ alert.labels.cluster}}
                - type: mrkdwn
                  text: |-
                    *Namespace:*
                    {{ alert.labels.namespace}}
                - type: mrkdwn
                  text: |-
                    *Status:*
                    {{ alert.status}}
     workflow:
      id: pagerduty-incident-workflow
      description: Create PagerDuty incident for alerts
      name: pagerduty-incident-workflow
      triggers:
        - type: alert
          filters:
            - key: annotations.pagerduty-incident-workflow
              value: enabled
      consts:
        severities: '{"S1": "critical","S2": "error","S3": "warning","S4": "info","critical": "critical","error": "error","warning": "warning","info": "info"}'
        severity: keep.dictget( '{{ consts.severities }}', '{{ alert.annotations._gc_severity }}', 'info')
        description: keep.dictget( {{ alert.annotations }}, "_gc_description", "")
        title: keep.dictget( {{ alert.annotations }}, "_gc_issue_header", '{{ alert.alertname }}')
        redacted_labels: keep.dict_pop({{ alert.labels }}, "_gc_monitor_id", "_gc_monitor_name", "_gc_severity", "backend_id", "grafana_folder")
        env: keep.dictget( {{ alert.labels }}, "env", "- no env -")
        namespace: keep.dictget( {{ alert.labels }}, "namespace", "- no namespace -")
        workload: keep.dictget( {{ alert.labels }}, "workload", "- no workload -")
        pod: keep.dictget( {{ alert.labels }}, "podName", "- no pod -")
        issue: https://app.groundcover.com/monitors/issues?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.fingerprint }}
        monitor: https://app.groundcover.com/monitors?backendId={{ alert.labels.backend_id }}&selectedObjectId={{ alert.labels._gc_monitor_id }}
        silence: https://app.groundcover.com/monitors/create-silence?keep.replace(keep.join({{ consts.redacted_labels }}, "&", "matcher_"), " ", "+")
      actions:
      - name: pagerduty-alert
        provider:
          config: '{{ providers.pagerduty-integration-name }}'
          type: pagerduty
          with:
            title: '{{ consts.title }}'
            severity: '{{ consts.severity }}'
            dedup_key: '{{alert.fingerprint}}'
            custom_details:
              01_environment: '{{ consts.env }}'
              02_namespace: '{{ consts.namespace }}'
              03_service_name: '{{ consts.workload }}'
              04_pod: '{{ consts.pod }}'
              05_labels: '{{ consts.redacted_labels }}'
              06_monitor: '{{ consts.monitor }}'
              07_issue: '{{ consts.issue }}'
              08_silence: '{{ consts.silence }}'
    
    workflow:
      id: Opsgenie Example
      description: "Opsgenie workflow"
      triggers:
        - type: alert
          filters:
            - key: annotations.Opsgenie Example
              value: enabled
      consts:
        description: keep.dictget( {{ alert.annotations }}, "_gc_description", "")
        redacted_labels: keep.dict_pop({{ alert.labels }}, "_gc_monitor_id", "_gc_monitor_name", "_gc_severity", "backend_id", "grafana_folder", "CampaignName")
        severities: '{"S1": "P1","S2": "P2","S3": "P3","S4": "P4","critical": "P1","error": "P2","warning": "P3","info": "P4"}'
        severity: keep.dictget({{ consts.severities }}, {{ alert.annotations._gc_severity }}, "P3")
        title: keep.dictget( {{ alert.annotations }}, "_gc_issue_header", '{{ alert.alertname }}')
        region: keep.dictget( {{ alert.labels }}, "cloud.region", "")
        TenantID: keep.dictget( {{ alert.labels }}, "tenantID", "")
      name: Opsgenie Example
      actions:
      - if: '{{ alert.status }} == "firing"'
        name: opesgenie-alert
        provider:
          config: "{{ providers.Opsgenie }}"
          type: opsgenie
          with:
            alias: '{{ alert.fingerprint }}'
            description: '{{ consts.description }}'
            details: '{{ consts.redacted_labels }}'
            message: '{{ consts.title }}'
            priority: '{{ consts.severity }}'
            source: groundcover
            tags:
            - '{{ alert.alertname }}'
            - '{{ consts.TenantID }}'
            - '{{ consts.region }}'
      
    workflow:
      id: jira-ticket-creation
      description: Create Jira ticket for alerts
      triggers:
        - type: alert
          filters:
            - key: annotations.jira-ticket-creation
              value: enabled    
      consts:
        description: keep.dictget({{ alert.annotations }}, "_gc_description", '')
        title: keep.dictget({{ alert.annotations }}, "_gc_issue_header", "{{ alert.alertname }}")
      actions:
        - name: jira-ticket
          provider:
            type: webhook
            config: '{{ providers.jira_webhook }}'
            with:
              body:
                fields:
                  description: '{{ consts.description }}'
                  issuetype:
                    id: 10001
                  project:
                    id: 10000
                  summary: '{{ consts.title }}'
    workflow:
      id: multi-action-workflow
      description: Perform multiple actions for critical alerts
      triggers:
        - type: alert
          filters:
            - key: annotations.multi-action-workflow
              value: enabled      
            - key: severity
              value: critical
      actions:
        - name: slack-notification
          provider:
            type: slack
            config: '{{ providers.slack_webhook }}'
            with:
              message: "Critical alert: {{ alert.alertname }}"
        - name: pagerduty-incident
          provider:
            type: pagerduty
            config: '{{ providers.pagerduty_prod }}'
            with:
              title: "Critical: {{ alert.alertname }}"
        - name: jira-ticket
          provider:
            type: webhook
            config: '{{ providers.jira_webhook }}'
            with:
              body:
                fields:
                  summary: "Critical Alert: {{ alert.alertname }}"
                  description: "Critical alert triggered in {{ alert.labels.namespace }}"
                  issuetype:
                    id: 10001
                  project:
                    id: 10000
    Behavior: Full creation/editing capabilities on observability data, but no user or system management.
    Behavior: Read-only access to all data in groundcover.
    Example:
    • Logs: “Cluster = Dev OR Prod,”

    • Traces: “Namespace = abc123,”

    • Events: “Environment = Staging OR Prod.”

    This applies to all data types including logs, traces, events, workloads, and metrics.

    victoria-metrics
    deployment, ensuring top-notch performance on every scale.

    hashtag
    Golden signals

    In the world of excessive data, it's important to have a rule of thumb for knowing where to start looking. For application metrics, we rely on our golden signalsarrow-up-right.

    The following metrics are generated for each resource being aggregated:

    • Requests per second (RPS)

    • Errors rate

    • Latencies (p50 and p95)

    The golden signals are then displayed in two important ways: Workload and Resource aggregations.

    circle-info

    See below for the full list of generated workload and resource golden metrics.

    Resource aggregations are highly granularity metrics, providing insights into individual APIs.

    Workload aggregations are designed to show an overview of each service, enabling a higher level inspection. These are constructed using all of the resources recorded for each service.

    hashtag
    Controlling retention

    groundcover allows full control over the retention of your metrics. Learn more here.

    hashtag
    List of available metrics

    Below you will find the full list of our APM metrics, as well as the labels we export for each. These labels are designed with high granularity in mind for maximal insight depth. All of the metrics listed are available out of the box after installing groundcover, without any further setup.

    circle-info

    We fully support the ingestion of custom metrics to further expand the visibility into your environment.

    We also allow for building custom dashboards, enabling full freedom in deciding how to display your metrics - building on groundcover's metrics below plus every custom metric ingested.

    hashtag
    Our labels

    Label name
    Description
    Relevant types

    clusterId

    Name identifier of the K8s cluster

    region

    Cloud provider region name

    namespace

    K8s namespace

    workload_name

    K8s workload (or service) name

    circle-info

    Summary based metrics have an additional quantile label, representing the percentile. Available values: [”0.5”, “0.95”, 0.99”].

    circle-exclamation

    groundcoverarrow-up-right uses a set of internal labels which are not relevant in most use-cases. Find them interesting? Let us know over Slack!arrow-up-right

    issue_id entity_id resource_id query_id aggregation_id parent_entity_id perspective_entity_id perspective_entity_is_external perspective_entity_issue_id perspective_entity_name perspective_entity_namespace perspective_entity_resource_id

    hashtag
    Golden Signals Metrics

    circle-info

    In the lists below, we describe error and issue counters. Every issue flagged by groundcover is an error; but not every error is flagged as an issue.

    hashtag
    Resource metrics

    Name
    Description
    Type

    groundcover_resource_total_counter

    total amount of resource requests

    Counter

    groundcover_resource_error_counter

    total amount of requests with error status codes

    Counter

    groundcover_resource_issue_counter

    total amount of requests which were flagged as issues

    Counter

    groundcover_resource_success_counter

    total amount of resource requests with OK status codes

    hashtag
    Workload metrics

    Name
    Description
    Type

    groundcover_workload_total_counter

    total amount of requests handled by the workload

    Counter

    groundcover_workload_error_counter

    total amount of requests handled by the workload with error status codes

    Counter

    groundcover_workload_issue_counter

    total amount of requests handled by the workload which were flagged as issues

    Counter

    groundcover_workload_success_counter

    total amount of requests handled by the workload with OK status codes

    hashtag
    Storage usage metrics

    Name
    Description
    Type

    groundcover_pvc_read_bytes_total

    total amount of bytes read by the workload from the PVC

    Counter

    groundcover_pvc_write_bytes_total

    total amount of bytes written by the workload to the PVC

    Counter

    groundcover_pvc_reads_total

    total amount of read operations done by the workload from the PVC

    Counter

    groundcover_pvc_writes_total

    total amount of write operations done by the workload to the PVC

    hashtag
    Kafka specific metrics

    Name
    Description
    Type

    groundcover_client_offset

    client last message offset (for producer the last offset produced, for consumer the last requested offset)

    Gauge

    groundcover_workload_client_offset

    client last message offset (for producer the last offset produced, for consumer the last requested offset), aggregated by workload

    Gauge

    groundcover_calc_lagged_messages

    current lag in messages

    Gauge

    groundcover_workload_calc_lagged_messages

    current lag in messages, aggregated by workload

    Gauge
    Stream processing
    Metrics & Labels

    Advanced Log Queries

    This document provides advanced examples of querying logs using the groundcover API with complex filtering conditions.

    hashtag
    Overview

    The group parameter supports various filter operations and combinations. This page demonstrates advanced patterns including:

    • OR conditions within a single field

    • NOT conditions (excluding values)

    • Wildcard matching

    • Free text search

    hashtag
    OR Conditions on a Single Field

    To match multiple values for the same field (OR condition), include multiple filters in the same condition:

    Example: Match logs where env is either "prod" OR "dev" AND level is "error"

    Structure:

    • Multiple filters in the same conditions entry create an OR relationship

    • In this example, the env field matches if it equals "prod" OR "dev"

    • The overall

    hashtag
    NOT Conditions (Excluding Values)

    To exclude specific values, use the "op": "not_match" operation:

    Example: Match logs where level is NOT "error" AND env is "prod"

    Structure:

    • Use "op": "not_match" to exclude values

    • This example returns logs where level is anything except "error" AND env is "prod"

    hashtag
    Wildcard Matching

    To match values that start with, end with, or contain a pattern, use wildcard characters:

    Example: Match logs where workload starts with "groundcover" AND level is NOT "error"

    Structure:

    • Use * as a wildcard character

    • "groundcover*" matches any value starting with "groundcover" (e.g., "groundcover-backend", "groundcover-frontend")

    • Wildcards can be used with both match and not_match

    hashtag
    Free Text Search

    To search for specific text or phrases within log content (not limited to specific fields), use free text search with phrase_search:

    Example: Search for logs containing the phrase "POST /ingest" with additional fields

    Structure:

    • Use "op": "phrase_search" for free text search

    • Set "type": "freetext" (instead of "type": "string")

    • Do not include a "key"

    hashtag
    Filter Operations

    The following filter operations are supported:

    Operation
    Description
    Example

    hashtag
    Combining Conditions

    hashtag
    AND Operator

    When using "operator": "and", all conditions must be satisfied:

    This means: level condition AND env condition must both be true.

    hashtag
    OR Within a Field

    Multiple filters in the same condition create an OR relationship:

    This means: env equals "prod" OR env equals "dev".

    hashtag
    Complex Combinations

    You can combine AND and OR operations:

    This means: level equals "error" AND (env equals "prod" OR env equals "dev").

    hashtag
    Best Practices

    1. Wildcard Performance: Use wildcards judiciously as they can impact query performance on large datasets.

    2. NOT Operations: not_match operations can be slower than match operations. Consider if you can achieve the same result with positive matches.

    3. Multiple Filters: When using multiple filters in a single condition (OR), ensure they all apply to the same field.

    List Clusters

    Retrieve a list of Kubernetes clusters with their resource usage metrics, metadata, and health information.

    hashtag
    Endpoint

    hashtag
    Authentication

    This endpoint requires API Key authentication via the Authorization header.

    hashtag
    Headers

    Header
    Required
    Description

    hashtag
    Request Body

    Parameter
    Type
    Required
    Description

    hashtag
    Response

    The response contains an array of clusters with detailed resource usage and metadata.

    hashtag
    Response Fields

    Field
    Type
    Description

    hashtag
    Cluster Object Fields

    Field
    Type
    Description

    CPU Metrics

    Field
    Type
    Description

    Memory Metrics

    Field
    Type
    Description

    Pod Information

    Field
    Type
    Description

    hashtag
    Examples

    hashtag
    Basic Request

    hashtag
    Response Example

    LLM Observability

    hashtag
    Overview

    LLM Observability is the practice of monitoring, analyzing, and troubleshooting interactions with Large Language Models (LLMs) across distributed systems. It focuses on capturing data regarding prompt content, response quality, performance latency, and token costs.

    groundcover provides a unified view of your GenAI traffic by combining two powerful data collection methods: zero-instrumentation eBPF tracing and native OpenTelemetry ingestion.

    List Workloads

    Retrieve a list of Kubernetes workloads with their performance metrics, resource usage, and metadata.

    hashtag
    Endpoint

    hashtag
    Authentication

    This endpoint requires API Key authentication via the Authorization header.

    List Nodes with Resource Information

    hashtag
    Endpoint

    POST /api/k8s/v2/nodes/info-with-resources

    hashtag

    POST /api/k8s/v3/clusters/list

    pod_name

    K8s pod name

    container_name

    K8s container name

    container_image

    K8s container image name

    remote_namespace

    Remote K8s namespace (other side of the communication)

    remote_service_name

    Remote K8s service name (other side of the communication)

    remote_container_name

    Remote K8s container name (other side of the communication)

    type

    The protocol in use (HTTP, gRPC, Kafka, DNS etc.)

    role

    Role in the communication (client or server)

    clustered_path

    HTTP / gRPC aggregated resource path (e.g. /metrics/*)

    http, grpc

    method

    HTTP / gRPC method (e.g GET)

    http, grpc

    response_status_code

    Return status code of a HTTP / gPRC request (e.g. 200 in HTTP)

    http, grpc

    dialect

    SQL dialect (MySQL or PostgreSQL)

    mysql, postgresql

    response_status

    Return status code of a SQL query (e.g 42P01 for undefined table)

    mysql, postgresql

    client_type

    Kafka client type (Fetcher / Producer)

    kafka

    topic

    Kafka topic name

    kafka

    partition

    Kafka partition identifier

    kafka

    error_code

    Kafka return status code

    kafka

    query_type

    type of DNS query (e.g. AAAA)

    dns

    response_return_code

    Return status code of a DNS resolution request (e.g. Name Error)

    dns

    method_name, method_class_name

    Method code for the operation

    amqp

    response_method_name, response_method_class_name

    Method code for the operation's response

    amqp

    exit_code

    K8s container termination exit code

    container_state, container_crash

    state

    K8s container current state (Running, Waiting or Terminated)

    container_state

    state_reason

    K8s container state transition reason (e.g CrashLoopBackOff or OOMKilled)

    container_state

    crash_reason

    K8s container crash reason (e.g Error, OOMKilled)

    container_crash

    pvc_name

    K8s PVC name

    storage

    Counter

    groundcover_resource_latency_seconds

    resource latency [sec]

    Summary
    Counter

    groundcover_workload_latency_seconds

    resource latency across all of the workload APIs [sec]

    Summary
    Counter

    groundcover_pvc_read_latency

    latency of read operation by the workload from the PVC, in microseconds

    Summary

    groundcover_pvc_write_latency

    latency of write operation by the workload to the PVC, in microseconds

    Summary

    groundcover_calc_lag_seconds

    current lag in time [sec]

    Gauge

    groundcover_workload_calc_lag_seconds

    current lag in time, aggregated by workload [sec]

    Gauge
    operator: "and"
    means both conditions must be satisfied (level=error AND (env=prod OR env=dev))
    operations
    field - free text search searches across all log content
  • The value can be a single word or a phrase (e.g., "POST /ingest")

  • Free text search searches within the log line content, not specific fields

  • Use selectors to request additional fields. Format: [{"key": "request.method"}, {"key": "cluster"}, {"key": "host.id"}]

  • Consistent Structure: Always include origin: "root" and the appropriate type for each condition.

  • Testing: Test complex queries with small time ranges first to verify the results before querying large datasets.

  • match

    Matches the exact value or pattern

    {"op": "match", "value": "error"}

    not_match

    Excludes the exact value or pattern

    {"op": "not_match", "value": "error"}

    phrase_search

    Searches for a phrase in log content (free text)

    {"op": "phrase_search", "value": "POST /ingest"}

    kubernetesVersion

    String

    Kubernetes version

    nodesCount

    Integer

    Number of nodes in the cluster

    issueCount

    Integer

    Number of issues detected

    cpuUsageAllocatablePercent

    Float

    CPU usage as percentage of allocatable

    cpuRequestAllocatablePercent

    Float

    CPU requests as percentage of allocatable

    cpuUsageRequestPercent

    Float

    CPU usage as percentage of requests

    cpuUsageLimitPercent

    Float

    CPU usage as percentage of limits

    cpuLimitAllocatablePercent

    Float

    CPU limits as percentage of allocatable

    memoryUsageAllocatablePercent

    Float

    Memory usage as percentage of allocatable

    memoryRequestAllocatablePercent

    Float

    Memory requests as percentage of allocatable

    memoryUsageRequestPercent

    Float

    Memory usage as percentage of requests

    memoryUsageLimitPercent

    Float

    Memory usage as percentage of limits

    memoryLimitAllocatablePercent

    Float

    Memory limits as percentage of allocatable

    Authorization

    Yes

    Bearer token with your API key

    X-Backend-Id

    Yes

    Your backend identifier

    Content-Type

    Yes

    Must be application/json

    Accept

    Yes

    Must be application/json

    sources

    Array

    No

    Filter by data sources (empty array for all sources)

    clusters

    Array

    Array of cluster objects

    totalCount

    Integer

    Total number of clusters

    name

    String

    Cluster name

    env

    String

    Environment (e.g., "prod", "ga", "beta", "alpha", "latest")

    creationTimestamp

    String

    When the cluster was created (ISO 8601)

    cloudProvider

    String

    Cloud provider (e.g., "AWS", "GCP", "Azure")

    cpuUsage

    Integer

    Current CPU usage in millicores

    cpuLimit

    Integer

    CPU limits set on resources in millicores

    cpuAllocatable

    Integer

    Total allocatable CPU in millicores

    cpuRequest

    Integer

    Total CPU requests in millicores

    memoryUsage

    Integer

    Current memory usage in bytes

    memoryLimit

    Integer

    Memory limits set on resources in bytes

    memoryAllocatable

    Integer

    Total allocatable memory in bytes

    memoryRequest

    Integer

    Total memory requests in bytes

    pods

    Object

    Pod counts by status (e.g., {"Running": 157, "Succeeded": 4})

    curl 'https://app.groundcover.com/api/logs/v2/query' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'Content-Type: application/json' \
      --data-raw '{
        "start": "2025-12-24T13:36:49.143Z",
        "end": "2025-12-24T19:36:49.143Z",
        "group": {
          "operator": "and",
          "conditions": [
            {
              "filters": [{"op": "match", "value": "error"}],
              "key": "level",
              "origin": "root",
              "type": "string"
            },
            {
              "filters": [
                {"op": "match", "value": "prod"},
                {"op": "match", "value": "dev"}
              ],
              "key": "env",
              "origin": "root",
              "type": "string"
            }
          ]
        },
        "limit": 200,
        "skip": 0,
        "sortBy": "timestamp",
        "sortOrder": "desc",
        "selectors": [],
        "optimizeSearch": true
      }'
    curl 'https://app.groundcover.com/api/logs/v2/query' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'Content-Type: application/json' \
      --data-raw '{
        "start": "2025-12-24T13:39:48.388Z",
        "end": "2025-12-24T19:39:48.388Z",
        "group": {
          "operator": "and",
          "conditions": [
            {
              "filters": [{"op": "not_match", "value": "error"}],
              "key": "level",
              "origin": "root",
              "type": "string"
            },
            {
              "filters": [{"op": "match", "value": "prod"}],
              "key": "env",
              "origin": "root",
              "type": "string"
            }
          ]
        },
        "limit": 200,
        "skip": 0,
        "sortBy": "timestamp",
        "sortOrder": "desc",
        "selectors": [],
        "optimizeSearch": true
      }'
    curl 'https://app.groundcover.com/api/logs/v2/query' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'Content-Type: application/json' \
      --data-raw '{
        "start": "2025-12-24T13:41:07.744Z",
        "end": "2025-12-24T19:41:07.744Z",
        "group": {
          "operator": "and",
          "conditions": [
            {
              "filters": [{"op": "not_match", "value": "error"}],
              "key": "level",
              "origin": "root",
              "type": "string"
            },
            {
              "filters": [{"op": "match", "value": "groundcover*"}],
              "key": "workload",
              "origin": "root",
              "type": "string"
            }
          ]
        },
        "limit": 200,
        "skip": 0,
        "sortBy": "timestamp",
        "sortOrder": "desc",
        "selectors": [],
        "optimizeSearch": true
      }'
    curl 'https://app.groundcover.com/api/logs/v2/query' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'Content-Type: application/json' \
      --data-raw '{
        "start": "2025-12-24T14:10:06.183Z",
        "end": "2025-12-24T20:10:06.183Z",
        "group": {
          "conditions": [{
            "filters": [{"op": "phrase_search", "value": "POST /ingest"}],
            "origin": "root",
            "type": "freetext"
          }],
          "operator": "and"
        },
        "limit": 200,
        "skip": 0,
        "sortBy": "timestamp",
        "sortOrder": "desc",
        "selectors": [{"key": "request.method"}, {"key": "cluster"}, {"key": "host.id"}],
        "optimizeSearch": true
      }'
    {
      "operator": "and",
      "conditions": [
        {"filters": [...], "key": "level", ...},
        {"filters": [...], "key": "env", ...}
      ]
    }
    {
      "filters": [
        {"op": "match", "value": "prod"},
        {"op": "match", "value": "dev"}
      ],
      "key": "env",
      ...
    }
    {
      "operator": "and",
      "conditions": [
        {
          "filters": [{"op": "match", "value": "error"}],
          "key": "level",
          "origin": "root",
          "type": "string"
        },
        {
          "filters": [
            {"op": "match", "value": "prod"},
            {"op": "match", "value": "dev"}
          ],
          "key": "env",
          "origin": "root",
          "type": "string"
        }
      ]
    }
    curl 'https://api.groundcover.com/api/k8s/v3/clusters/list' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'content-type: application/json' \
      -H 'X-Backend-Id: <YOUR_BACKEND_ID>' \
      --data-raw '{"sources":[]}'
    {
      "clusters": [
        {
          "name": "production-cluster",
          "env": "prod",
          "cpuUsage": 126640,
          "cpuLimit": 289800,
          "cpuAllocatable": 302820,
          "cpuRequest": 187975,
          "cpuUsageAllocatablePercent": 41.82,
          "cpuRequestAllocatablePercent": 62.07,
          "cpuUsageRequestPercent": 67.37,
          "cpuUsageLimitPercent": 43.70,
          "cpuLimitAllocatablePercent": 95.70,
          "memoryUsage": 242994409472,
          "memoryLimit": 604262891520,
          "memoryAllocatable": 1227361431552,
          "memoryRequest": 495549677568,
          "memoryUsageAllocatablePercent": 19.80,
          "memoryRequestAllocatablePercent": 40.38,
          "memoryUsageRequestPercent": 49.04,
          "memoryUsageLimitPercent": 40.21,
          "memoryLimitAllocatablePercent": 49.23,
          "nodesCount": 6,
          "pods": {
            "Running": 109,
            "Succeeded": 3
          },
          "issueCount": 1,
          "creationTimestamp": "2021-11-01T14:37:31Z",
          "cloudProvider": "AWS",
          "kubernetesVersion": "v1.30.14-eks-931bdca"
        }
      ],
      "totalCount": 116
    }
    hashtag
    eBPF-Based Tracing - Zero Instrumentation

    groundcover automatically detects and traces LLM API calls without requiring SDKs, wrappers, or code modification.

    The sensor captures traffic at the kernel level, extracting key data points and transforming requests into structured spans and metrics. This allows for instant visibility into third-party providers without altering application code. This method captures:

    • Payloads: Full prompt and response bodies (supports redaction).

    • Usage: Token counts (input, output, total).

    • Metadata: Model versions, temperature, and parameters.

    • Performance: Latency and completion time.

    • Status: Error messages and finish reasons.

    hashtag
    OpenTelemetry Instrumentation Support

    In addition to auto-detection, groundcover supports the ingestion of traces generated by manual OpenTelemetry instrumentation.

    If your applications are already instrumented using OpenTelemetry SDKs (e.g., using the OpenTelemetry Python or JavaScript instrumentation for OpenAI/LangChain), groundcover will seamlessly ingest, process, and visualize these spans alongside your other telemetry data.

    hashtag
    Generative AI Span Structure

    When groundcover captures traffic via eBPF, it automatically transforms the data into structured spans that adhere to the OpenTelemetry GenAI Semantic Conventionsarrow-up-right.

    This standardization allows LLM traces to correlate with existing application telemetry. Below are the attributes captured for each eBPF-generated LLM span:

    Attribute
    Description
    Example

    gen_ai.system

    The Generative AI provider

    openai

    gen_ai.request.model

    The model name requested by the client

    gpt-4

    gen_ai.response.model

    The name of the model that generated the response

    gpt-4-0613

    gen_ai.response.usage.input_tokens

    Tokens consumed by the input (prompt)

    hashtag
    Generative AI Metrics

    groundcover automatically generates rate, errors, duration and usage metrics from the LLM traces. These metrics adhere to OpenTelemetry GenAI conventionsarrow-up-right and are enriched with Kubernetes context (cluster, namespace, workload, etc).

    Metric Name
    Description

    groundcover_workload_gen_ai_response_usage_input_tokens

    Input token count, aggregated by K8s workload

    groundcover_workload_gen_ai_response_usage_output_tokens

    Output token count, aggregated by K8s workload

    groundcover_workload_gen_ai_response_usage_total_tokens

    Total token usage, aggregated by K8s workload

    groundcover_gen_ai_response_usage_input_tokens

    Global input token count (cluster-wide)

    groundcover_gen_ai_response_usage_output_tokens

    Global output token count (cluster-wide)

    groundcover_gen_ai_response_usage_total_tokens

    Global total token usage (cluster-wide)

    Available Labels:

    Metrics can be filtered by: workload, namespace, cluster, gen_ai_request_model, gen_ai_system, client, server, and status_code.

    hashtag
    Configuration

    hashtag
    Obfuscation Configuration

    LLM payloads often contain sensitive data (PII, secrets). By default, groundcover collects full payloads to aid in debugging. You can configure the agent to obfuscate specific fields within the prompts or responses using the httphandler configuration in your values.yaml.

    See Sensitive data obfuscation for full details on obfuscation in groundcover.

    circle-info

    By default groundcover does not obfuscate LLM payloads.

    hashtag
    Obfuscating Request Prompts

    This configuration will obfuscate request prompts, while keeping metadata like model, tokens, etc

    hashtag
    Obfuscating Response Prompts

    This configuration will obfuscate response data, while keeping metadata like model, tokens, etc

    hashtag
    Supported Providers

    groundcover currently supports the following providers via auto-detection:

    • OpenAI (Chat Completion API)

    • Anthropic (Chat Completion API)

    • AWS Bedrock APIs

    circle-info

    For providers not listed above, manual OpenTelemetry instrumentation can be used to send data to groundcover.

    hashtag
    Headers

    Header
    Required
    Description

    Authorization

    Yes

    Bearer token with your API key

    X-Backend-Id

    Yes

    Your backend identifier

    Content-Type

    Yes

    Must be application/json

    Accept

    Yes

    Must be application/json

    hashtag
    Request Body

    Parameter
    Type
    Required
    Default
    Description

    conditions

    Array

    No

    []

    Filter conditions for workloads

    limit

    Integer

    No

    100

    Maximum number of workloads to return (1-1000)

    hashtag
    Response

    The response contains a paginated list of workloads with their metrics and metadata.

    hashtag
    Response Fields

    Field
    Type
    Description

    total

    Integer

    Total number of workloads available

    workloads

    Array

    Array of workload objects

    hashtag
    Workload Object Fields

    Field
    Type
    Description

    uid

    String

    Unique identifier for the workload

    envType

    String

    Environment type (e.g., "k8s")

    env

    String

    Environment name (e.g., "prod", "ga", "alpha")

    cluster

    String

    Kubernetes cluster name

    hashtag
    Examples

    hashtag
    Basic Request

    hashtag
    Response Example

    hashtag
    Pagination

    To retrieve all workloads, use pagination by incrementing the skip parameter:

    hashtag
    Fetching All Results

    hashtag
    Pagination Logic

    To fetch all results programmatically:

    1. Start with skip=0 and limit=100 (or your preferred page size)

    2. Check the total field in the response

    3. Continue making requests, incrementing skip by your limit value

    4. Stop when skip >= total

    Example calculation:

    • If total is 6314 and limit is 100

    • You need ⌈6314/100⌉ = 64 requests

    • Last request: skip=6300, limit=100 (returns 14 items)

    Authentication

    This endpoint requires API key authentication.

    hashtag
    Headers

    Header
    Value
    Description

    Authorization

    Bearer <YOUR_API_KEY>

    Your groundcover API key

    Content-Type

    application/json

    Request body format

    X-Backend-Id

    <YOUR_BACKEND_ID>

    Your backend identifier

    hashtag
    Request Body

    Field
    Type
    Required
    Description

    start

    string

    Yes

    Start time in ISO 8601 UTC format

    end

    string

    Yes

    End time in ISO 8601 UTC format

    sources

    array

    No

    hashtag
    Sources Structure (Cluster Filter)

    hashtag
    Example Request

    hashtag
    Response

    hashtag
    Response Fields

    Field
    Type
    Description

    nodes

    array

    Array of node objects

    nodes[].uid

    string

    Unique identifier for the node

    nodes[].name

    string

    Node name

    nodes[].cluster

    string

    Cluster name

    hashtag
    Filter Operations

    Operator
    Description

    eq

    Equals

    ne

    Not equals

    gt

    Greater than

    lt

    Less than

    contains

    Contains substring

    hashtag
    Common Use Cases

    hashtag
    Get All Nodes

    hashtag
    Filter by Specific Cluster

    List Deployments

    Get a list of Kubernetes deployments with status information, replica counts, and operational conditions for a specified time range.

    hashtag
    Endpoint

    POST /api/k8s/v2/deployments/list

    hashtag
    Authentication

    This endpoint requires API Key authentication via the Authorization header.

    hashtag
    Headers

    Header
    Required
    Description

    hashtag
    Request Body

    The request body requires a time range and supports filtering by fields:

    Parameters

    Parameter
    Type
    Required
    Description

    hashtag
    Response

    hashtag
    Response Schema

    Field Descriptions

    Field
    Type
    Description

    Common Condition Types

    Type
    Description

    Common Condition Reasons

    Reason
    Description

    hashtag
    Examples

    hashtag
    Basic Request

    Get deployments for a specific time range:

    hashtag
    Filter by Namespace

    Get deployments from specific namespaces:

    hashtag
    Response Example

    hashtag
    Time Range Guidelines

    • Use ISO 8601 UTC format for timestamps

    • Typical time ranges: 1-24 hours for operational monitoring

    • Maximum recommended range: 7 days

    The basics of querying logs

    This document provides guidance on querying logs using the groundcover API. It outlines the necessary authentication steps, request structure, and examples to help you retrieve log data effectively.

    hashtag
    Endpoint

    POST https://app.groundcover.com/api/logs/v2/query

    This endpoint allows you to execute queries to retrieve log data based on specified criteria.

    hashtag
    Authentication

    To authenticate your requests, include your API key in the Authorization header:

    • Authorization: Bearer <YOUR_API_KEY>

    Example:

    hashtag
    Request Body

    The request body should be a JSON object containing the following parameters:

    Parameter
    Type
    Description
    Example
    circle-exclamation

    Note: The limit parameter can get a maximum value of 5000

    Example Request Body:

    hashtag
    Example Queries

    hashtag
    Query Error Logs from Last 6 Hours

    To retrieve logs with the level "error" from the last 6 hours, use the group parameter with filtering conditions:

    Note: Ensure that the start and end times are adjusted to reflect the desired time range. The group parameter structure filters logs where the level field matches "error".

    hashtag
    Group Parameter Structure

    The group parameter uses the following structure for filtering:

    • conditions: Array of filter conditions

      • filters: Array of filter operations

        • op

    To query all logs without filtering, set group to null.

    hashtag
    Response

    The response is a JSON object containing log entries and metadata. Each log entry includes details such as timestamp, log level, line content, workload information, and associated metadata.

    Example Response:

    hashtag
    Using Selectors

    The selectors parameter allows you to specify additional fields and attributes to return with the logs. By default, logs include standard fields like timestamp, level, line, workload, namespace, cluster, etc. Use selectors to request additional fields.

    Example: Query with selectors to get additional fields

    hashtag
    Filtering with Group Parameter

    The group parameter allows you to filter logs based on field values. Here are some examples:

    Single condition (level = error):

    Multiple conditions (using AND operator):

    No filtering (query all logs):

    hashtag
    Best Practices

    1. Time Range: Always specify appropriate start and end times to limit the scope of your query and improve performance.

    2. Result Limits: Use the limit parameter to control the number of results returned. For large datasets, consider using pagination (see the ).

    hashtag

    Create a new Monitor

    Learn how to create and configure monitors using the Wizard, Monitor Catalog, or Import options. The following guide will help you set up queries, thresholds, and alert routing for effective monitoring.

    You can either create monitors using our web application following this guide, or use our API, see: or use our Terraform provider, see: .

    In the Monitors section (left navigation bar), navigate to the Issues page or the Monitor List page to create a new Monitor. Click on the “Create Monitor” button at the top right and select one of the following options from the dropdown:

    values.yaml
    httphandler:
      obfuscationConfig:
        keyValueConfig:
          enabled: true
          mode: "ObfuscateSpecificValues"
          specificKeys:
            - "messages"
            - "inputText"
            - "prompt"
    values.yaml
    httphandler:
      obfuscationConfig:
        keyValueConfig:
          enabled: true
          mode: "ObfuscateSpecificValues"
          specificKeys:
            - "choices"
            - "output"
            - "content"
            - "outputs"
            - "results"
            - "generation"
    POST /api/k8s/v3/workloads/list
    curl 'https://api.groundcover.com/api/k8s/v3/workloads/list' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'content-type: application/json' \
      -H 'X-Backend-Id: <YOUR_BACKEND_ID>' \
      --data-raw '{"conditions":[],"limit":100,"order":"desc","skip":0,"sortBy":"rps","sources":[]}'
    {
      "total": 6314,
      "workloads": [
        {
          "uid": "824b00bf-db68-47b5-8a53-9abd98bf7c0a",
          "envType": "k8s",
          "env": "ga",
          "cluster": "akamai-lk41ok",
          "namespace": "groundcover-incloud",
          "workload": "groundcover-incloud-vector",
          "kind": "ReplicaSet",
          "resourceVersion": 651723275,
          "ready": true,
          "podsCount": 5,
          "p50": 0.0005824280087836087,
          "p95": 0.005730729550123215,
          "p99": 0.0327172689139843,
          "rps": 5526.0027359781125,
          "errorRate": 0,
          "cpuLimit": 0,
          "cpuUsage": 50510.15252730218,
          "memoryLimit": 214748364800,
          "memoryUsage": 46527352832,
          "issueCount": 0
        }
      ]
    }
    # First batch (0-99)
    curl 'https://api.groundcover.com/api/k8s/v3/workloads/list' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'content-type: application/json' \
      -H 'X-Backend-Id: <YOUR_BACKEND_ID>' \
      --data-raw '{"conditions":[],"limit":100,"order":"desc","skip":0,"sortBy":"rps","sources":[]}'
    
    # Second batch (100-199)
    curl 'https://api.groundcover.com/api/k8s/v3/workloads/list' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'content-type: application/json' \
      -H 'X-Backend-Id: <YOUR_BACKEND_ID>' \
      --data-raw '{"conditions":[],"limit":100,"order":"desc","skip":100,"sortBy":"rps","sources":[]}'
    
    # Continue incrementing skip by 100 until you reach the total count
    {
      "key": "cluster",
      "type": "string",
      "origin": "root", 
      "filters": [
        {
          "op": "eq",
          "value": "cluster-name"
        }
      ]
    }
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/k8s/v2/nodes/info-with-resources' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --header 'X-Backend-Id: <YOUR_BACKEND_ID>' \
      --data '{
        "start": "2025-01-27T12:00:00.000Z",
        "end": "2025-01-27T14:00:00.000Z",
        "sources": [
          {
            "key": "cluster",
            "type": "string",
            "origin": "root",
            "filters": [
              {
                "op": "eq",
                "value": "my-cluster"
              }
            ]
          }
        ],
        "limit": 100,
        "nameFilter": ""
      }'
    {
      "nodes": [
        {
          "uid": "node-uid",
          "name": "node-name",
          "cluster": "cluster-name",
          "env": "environment-name",
          "creationTimestamp": "2025-01-01T10:00:00Z",
          "labels": {
            "kubernetes.io/arch": "amd64",
            "kubernetes.io/os": "linux",
            "node.kubernetes.io/instance-type": "t3.medium"
          },
          "addresses": [
            {
              "type": "InternalIP",
              "address": "10.0.1.100"
            },
            {
              "type": "ExternalIP", 
              "address": "203.0.113.100"
            }
          ],
          "nodeInfo": {
            "kubeletVersion": "v1.24.0",
            "kubeProxyVersion": "v1.24.0",
            "operatingSystem": "linux",
            "architecture": "amd64",
            "containerRuntimeVersion": "containerd://1.6.0",
            "kernelVersion": "5.4.0-91-generic",
            "osImage": "Ubuntu 20.04.3 LTS"
          },
          "capacity": {
            "cpu": "2",
            "memory": "8Gi",
            "pods": "110"
          },
          "allocatable": {
            "cpu": "1940m",
            "memory": "7Gi", 
            "pods": "110"
          },
          "usage": {
            "cpu": "500m",
            "memory": "3Gi"
          },
          "ready": true,
          "conditions": [
            {
              "type": "Ready",
              "status": "True",
              "lastTransitionTime": "2025-01-01T10:05:00Z",
              "reason": "KubeletReady",
              "message": "kubelet is posting ready status"
            }
          ]
        }
      ]
    }
    {
      "start": "2025-01-27T12:00:00.000Z",
      "end": "2025-01-27T14:00:00.000Z",
      "limit": 100
    }
    {
      "start": "2025-01-27T12:00:00.000Z",
      "end": "2025-01-27T14:00:00.000Z",
      "sources": [
        {
          "key": "cluster",
          "type": "string", 
          "origin": "root",
          "filters": [{"op": "eq", "value": "production-cluster"}]
        }
      ],
      "limit": 100
    }

    100

    gen_ai.response.usage.output_tokens

    Tokens generated in the response

    100

    gen_ai.response.usage.total_tokens

    Total token usage for the interaction

    200

    gen_ai.response.finish_reason

    Reason the model stopped generating

    stop ; length

    gen_ai.response.choice_count

    Target number of candidate completions

    3

    gen_ai.response.system_fingerprint

    Fingerprint to track backend environment changes

    fp_44709d6fcb

    gen_ai.response.tools_used

    Number of tools used in API call

    2

    gen_ai.request.temperature

    The temperature setting

    0.0

    gen_ai.request.max_tokens

    Maximum tokens allowed for the request

    100

    gen_ai.request.top_p

    The top_p sampling setting

    1.0

    gen_ai.request.stream

    Boolean indicating if streaming was enabled

    false

    gen_ai.response.message_id

    Unique ID of the message created by the server

    gen_ai.error.code

    The error code for the response

    gen_ai.error.message

    A human-readable description of the error

    gen_ai.error.type

    Describes a class of error the operation ended with

    timeout; java.net.UnknownHostException; server_certificate_invalid; 500

    gen_ai.operation.name

    The name of the operation being performed

    chat; generate_content; text_completion

    gen_ai.request.message_count

    Count of messages in API response

    1

    gen_ai.request.system_prompt

    Boolean flag whether system prompt was used in request prompts

    true

    gen_ai.request.tools_used

    Boolean flag whether any tools were used in requests

    true

    skip

    Integer

    No

    0

    Number of workloads to skip for pagination

    order

    String

    No

    "desc"

    Sort order: "asc" or "desc"

    sortBy

    String

    No

    "rps"

    Field to sort by (e.g., "rps", "cpuUsage", "memoryUsage")

    sources

    Array

    No

    []

    Filter by data sources

    namespace

    String

    Kubernetes namespace

    workload

    String

    Workload name

    kind

    String

    Kubernetes resource kind (e.g., "ReplicaSet", "StatefulSet", "DaemonSet")

    resourceVersion

    Integer

    Kubernetes resource version

    ready

    Boolean

    Whether the workload is ready

    podsCount

    Integer

    Number of pods in the workload

    p50

    Float

    50th percentile response time in seconds

    p95

    Float

    95th percentile response time in seconds

    p99

    Float

    99th percentile response time in seconds

    rps

    Float

    Requests per second

    errorRate

    Float

    Error rate as a decimal (e.g., 0.004 = 0.4%)

    cpuLimit

    Integer

    CPU limit in millicores (0 = no limit)

    cpuUsage

    Float

    Current CPU usage in millicores

    memoryLimit

    Integer

    Memory limit in bytes (0 = no limit)

    memoryUsage

    Integer

    Current memory usage in bytes

    issueCount

    Integer

    Number of issues detected

    Source filters (e.g., cluster filters)

    limit

    integer

    No

    Maximum number of nodes to return (default: 100)

    nodes[].env

    string

    Environment name

    nodes[].creationTimestamp

    string

    Node creation time in ISO 8601 format

    nodes[].labels

    object

    Node labels key-value pairs

    nodes[].addresses

    array

    Node IP addresses (internal/external)

    nodes[].nodeInfo

    object

    Node system information

    nodes[].capacity

    object

    Total node resource capacity

    nodes[].allocatable

    object

    Allocatable resources (capacity minus system reserved)

    nodes[].usage

    object

    Current resource usage

    nodes[].ready

    boolean

    Node readiness status

    nodes[].conditions

    array

    Node condition details

    sources

    array

    No

    Source filters

    creationTime

    string

    Deployment creation timestamp in ISO 8601 format

    cluster

    string

    Kubernetes cluster name

    env

    string

    Environment name (e.g., "prod", "staging")

    available

    integer

    Number of available replicas

    desired

    integer

    Number of desired replicas

    ready

    integer

    Number of ready replicas

    conditions

    array

    Array of deployment condition objects

    conditions[].type

    string

    Condition type (e.g., "Available", "Progressing")

    conditions[].status

    string

    Condition status ("True", "False", "Unknown")

    conditions[].lastProbeTime

    string/null

    Last time the condition was probed

    conditions[].lastHeartbeatTime

    string/null

    Last time the condition was updated

    conditions[].lastTransitionTime

    string

    Last time the condition transitioned

    conditions[].reason

    string

    Machine-readable reason for the condition

    conditions[].message

    string

    Human-readable message explaining the condition

    warnings

    array

    Array of warning messages (usually empty)

    id

    string

    Unique identifier for the deployment

    resourceVersion

    integer

    Kubernetes resource version

    Format: YYYY-MM-DDTHH:MM:SS.sssZ

    Authorization

    Yes

    Bearer token with your API key

    X-Backend-Id

    Yes

    Your backend identifier

    Content-Type

    Yes

    Must be application/json

    Accept

    Yes

    Must be application/json

    start

    string

    Yes

    Start time in ISO 8601 UTC format (e.g., "2025-08-24T07:21:36.944Z")

    end

    string

    Yes

    End time in ISO 8601 UTC format (e.g., "2025-08-24T08:51:36.944Z")

    namespaces

    array

    No

    deployments

    array

    Array of deployment objects

    name

    string

    Deployment name

    namespace

    string

    Kubernetes namespace

    workloadName

    string

    Associated workload name

    Available

    Deployment has minimum availability

    Progressing

    Deployment is making progress towards desired state

    MinimumReplicasAvailable

    Deployment has minimum number of replicas available

    NewReplicaSetAvailable

    New ReplicaSet has successfully progressed

    Array of namespace names to filter by (e.g., ["groundcover", "default"])

    limit

    integer

    Maximum number of log entries to return - The maximum allowed value is: 5000

    200

    skip

    integer

    Number of log entries to skip from the start

    0

    sortBy

    string

    Field to sort by

    "timestamp"

    sortOrder

    string

    Sort order; either "asc" for ascending or "desc" for descending

    "desc"

    selectors

    array

    List of additional fields/attributes to return with logs; empty array returns default fields. Format: [{"key": "field.name"}]

    []

    optimizeSearch

    boolean

    Whether to optimize the search query (should always be true)

    true

    : Operation type (e.g.,
    "match"
    )
  • value: Value to match

  • key: Field name to filter on (e.g., "level")

  • origin: Origin of the field (typically "root")

  • type: Data type of the field (e.g., "string")

  • operator: Logical operator to combine conditions (e.g., "and", "or")

  • Optimization: Always set optimizeSearch to true for better query performance on large datasets.
  • Filtering: Use the group parameter with conditions to filter logs by field values. Set group to null to query all logs without filtering.

  • Selectors: Use the selectors array to specify additional fields you want returned with the logs. Leave it empty to get default fields.

  • start

    string

    Start time for the query in ISO 8601 format

    "2025-12-24T07:54:29.459Z"

    end

    string

    End time for the query in ISO 8601 format

    "2025-12-24T13:54:29.459Z"

    group

    object | null

    Filtering/grouping conditions; set to null if not filtering

    Pagination documentation

    See examples below

    {
      "start": "2025-08-24T07:21:36.944Z",
      "end": "2025-08-24T08:51:36.944Z", 
      "namespaces": ["groundcover"],
      "sources": []
    }
    {
      "deployments": [
        {
          "name": "string",
          "namespace": "string", 
          "workloadName": "string",
          "creationTime": "2023-08-30T18:27:01Z",
          "cluster": "string",
          "env": "string",
          "available": 1,
          "desired": 1,
          "ready": 1,
          "conditions": [
            {
              "type": "string",
              "status": "string",
              "lastProbeTime": null,
              "lastHeartbeatTime": null,
              "lastTransitionTime": "string",
              "reason": "string",
              "message": "string"
            }
          ],
          "warnings": [],
          "id": "string",
          "resourceVersion": 0
        }
      ]
    }
    curl 'https://api.groundcover.com/api/k8s/v2/deployments/list' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'content-type: application/json' \
      -H 'X-Backend-Id: <YOUR_BACKEND_ID>' \
      --data-raw '{"start":"2025-08-24T07:21:36.944Z","end":"2025-08-24T08:51:36.944Z","namespaces":[],"sources":[]}'
    curl 'https://api.groundcover.com/api/k8s/v2/deployments/list' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'content-type: application/json' \
      -H 'X-Backend-Id: <YOUR_BACKEND_ID>' \
      --data-raw '{"start":"2025-08-24T07:21:36.944Z","end":"2025-08-24T08:51:36.944Z","namespaces":["groundcover","monitoring"],"sources":[]}'
    {
      "deployments": [
        {
          "name": "db-manager",
          "namespace": "groundcover",
          "workloadName": "db-manager",
          "creationTime": "2023-08-30T18:27:01Z",
          "cluster": "karma-cluster",
          "env": "prod",
          "available": 1,
          "desired": 1,
          "ready": 1,
          "conditions": [
            {
              "type": "Available",
              "status": "True",
              "lastProbeTime": null,
              "lastHeartbeatTime": null,
              "lastTransitionTime": "2025-08-22T06:18:27Z",
              "reason": "MinimumReplicasAvailable",
              "message": "Deployment has minimum availability."
            },
            {
              "type": "Progressing",
              "status": "True",
              "lastProbeTime": null,
              "lastHeartbeatTime": null,
              "lastTransitionTime": "2023-08-30T18:27:01Z",
              "reason": "NewReplicaSetAvailable",
              "message": "ReplicaSet \"db-manager-867bc8f5b8\" has successfully progressed."
            }
          ],
          "warnings": [],
          "id": "f3b1f4a5-f38a-4c63-a7c0-9333fcbf1906",
          "resourceVersion": 747039184
        }
      ]
    }
    curl 'https://app.groundcover.com/api/logs/v2/query' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'Content-Type: application/json' \
      --data-raw '{"start":"2025-12-24T07:54:29.459Z","end":"2025-12-24T13:54:29.459Z","group":null,"limit":200,"skip":0,"sortBy":"timestamp","sortOrder":"desc","selectors":[],"optimizeSearch":true}'
    {
      "start": "2025-12-24T07:54:29.459Z",
      "end": "2025-12-24T13:54:29.459Z",
      "group": null,
      "limit": 200,
      "skip": 0,
      "sortBy": "timestamp",
      "sortOrder": "desc",
      "selectors": [],
      "optimizeSearch": true
    }
    curl 'https://app.groundcover.com/api/logs/v2/query' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'Content-Type: application/json' \
      --data-raw '{
        "start": "2025-12-24T12:06:54.169Z",
        "end": "2025-12-24T18:06:54.169Z",
        "group": {
          "operator": "and",
          "conditions": [{
            "filters": [{
              "op": "match",
              "value": "error"
            }],
            "key": "level",
            "origin": "root",
            "type": "string"
          }]
        },
        "limit": 200,
        "skip": 0,
        "sortBy": "timestamp",
        "sortOrder": "desc",
        "selectors": [],
        "optimizeSearch": true
      }'
    {
      "operator": "and",
      "conditions": [{
        "filters": [{
          "op": "match",
          "value": "error"
        }],
        "key": "level",
        "origin": "root",
        "type": "string"
      }]
    }
    {
      "logs": [
        {
          "timestamp": "2025-12-24T13:54:29.458482612Z",
          "guid": "17666060341330778405",
          "line": "Loading installed providers to workflow",
          "level": "info",
          "format": "json",
          "workload": "groundcover-incloud-keep-backend",
          "pod_name": "groundcover-incloud-keep-backend-546f4c89f9-bmgt5",
          "namespace": "groundcover-incloud",
          "container": "keep",
          "podUid": "9e919ced-2f7c-442a-a31f-5f80790d735d",
          "envType": "K8s",
          "node": "ip-10-1-7-219.eu-west-3.compute.internal",
          "host": "ip-10-1-7-219.eu-west-3.compute.internal",
          "instance": "groundcover-incloud-keep-backend-546f4c89f9-bmgt5",
          "instanceId": "9e919ced-2f7c-442a-a31f-5f80790d735d",
          "instanceUid": "460424509714310639",
          "service": "groundcover-incloud-keep-backend",
          "functionName": "",
          "lambdaArn": "",
          "cluster": "groundcover-experiments",
          "env": "prod",
          "source": "k8s",
          "traceId": "cbbc0f1b72c3b9db69f3858edea9d970",
          "spanId": "fb2f0af20aa42300",
          "additionalColumns": {},
          "metadata": {}
        },
        {
          "timestamp": "2025-12-24T13:54:29.456626838Z",
          "guid": "17368171834420881877",
          "line": "Error getting workflow",
          "level": "error",
          "format": "json",
          "workload": "gc-keep-backend",
          "pod_name": "gc-keep-backend-6c9b99b67f-rhwm8",
          "namespace": "groundcover-main",
          "container": "keep",
          "podUid": "098e59f5-be01-4a4e-84b9-4c616961848a",
          "envType": "K8s",
          "node": "ip-172-31-47-39.eu-west-3.compute.internal",
          "host": "ip-172-31-47-39.eu-west-3.compute.internal",
          "instance": "gc-keep-backend-6c9b99b67f-rhwm8",
          "instanceId": "098e59f5-be01-4a4e-84b9-4c616961848a",
          "instanceUid": "7534023078789880662",
          "service": "gc-keep-backend",
          "functionName": "",
          "lambdaArn": "",
          "cluster": "noam-test",
          "env": "",
          "source": "k8s",
          "traceId": "69620eeecb241c1e56b077b7240f522e",
          "spanId": "53f160665fa2d452",
          "additionalColumns": {},
          "metadata": {}
        }
      ],
      "levels": ["info", "error"],
      "raiseAlert": true,
      "limitReached": false,
      "done": true,
      "optimizedKeys": null
    }
    curl 'https://app.groundcover.com/api/logs/v2/query' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'Content-Type: application/json' \
      --data-raw '{
        "start": "2025-12-24T14:10:06.183Z",
        "end": "2025-12-24T20:10:06.183Z",
        "group": {
          "conditions": [{
            "filters": [{"op": "phrase_search", "value": "POST /ingest"}],
            "origin": "root",
            "type": "freetext"
          }],
          "operator": "and"
        },
        "limit": 200,
        "skip": 0,
        "sortBy": "timestamp",
        "sortOrder": "desc",
        "selectors": [{"key": "request.method"}, {"key": "cluster"}, {"key": "host.id"}],
        "optimizeSearch": true
      }'
    {
      "group": {
        "operator": "and",
        "conditions": [{
          "filters": [{"op": "match", "value": "error"}],
          "key": "level",
          "origin": "root",
          "type": "string"
        }]
      }
    }
    {
      "group": {
        "operator": "and",
        "conditions": [
          {
            "filters": [{"op": "match", "value": "error"}],
            "key": "level",
            "origin": "root",
            "type": "string"
          },
          {
            "filters": [{"op": "match", "value": "prod"}],
            "key": "env",
            "origin": "root",
            "type": "string"
          }
        ]
      }
    }
    {
      "group": null
    }
    hashtag
    Using the Monitor Wizard

    hashtag
    Overview

    The Monitor Wizard is a guided, user-friendly approach to creating and configuring monitors tailored to your observability needs. By breaking down the process into simple steps, it ensures consistency and accuracy.

    hashtag
    Section 1: Query

    Select the data source, build the query and define thresholds for the monitor.

    circle-info

    If you're unfamiliar with query building in groundcover, refer to the Query Builder section for full details on the different components.

    • Data Source (Required):

      • Select the type of data (Metrics, Logs, Traces, or Events).

    • Query Functions:

      • Choose how to process the data (e.g., average, count).

      • Add aggregation (group by) clauses if applicable, you MUST use aggregations if you want to add labels to your issues.

      • Important: The labels used for aggregation (group by) maybe also be used for notification routes and the issue summary & description.

      • Examples: cluster, node, container_name

    • Time Window (Required):

      • Specify the period over which data is aggregated (the look-behind window).

      • Example: “Over the last 5 minutes.”

    • Window Aggregation (Required):

      • Specify the aggregation function to be used on the selected time window.

      • Example: "avg over the last 5m"

    • Threshold Conditions (Required):

      • Define when a monitor should trigger an Issue. You can use:

        • Greater Than - Trigger when the value exceeds X.

    • Preview Settings (Optional):

      • Preview data using Stacked Bar or Line Chart for better clarity while building the monitor.

      • Choose the Y axis units.

    • Advanced (Optional):

      • Evaluation Interval:

        • Specify how often the monitor evaluates the query

    hashtag
    Section 2: Monitor Details

    Set up the basic information for the monitor.

    • Monitor Name (Required):

      • Add a title for the monitor. The title will appear in notifications and in the Monitor List page.

      • Give the Monitor a clear, short name, that describes its function at a high level.

      • Examples:

        • “Workload High API Error Rate”

        • “Node High Memory”

    circle-info

    The title will appear in the monitors page table and be accessible in notification routes.

    • Severity (required):

      • Use severity to categorize alerts by importance.

      • Select a severity level (S1-S4).

      • Important: For connected apps (OpsGenie, PagerDuty) that require using specific Severities like P1-P4 or Critical-Info, we translate automatically to the relevant respective Severity.

    • Custom Labels (formally called 'metadata labels'):

      • Add custom labels (key:value) that will be added to all Issues generated by this monitor

      • Example: To create a notification route for my team's issues, add "Team:Infra" and use it in the notification route's scope

    hashtag
    Section 3: Issue Details

    Customize how the Monitor’s Issues will appear and what content will be sent in it's notifications. This section also includes a live preview of the way it will appear in the notification.

    circle-info

    Ensure that the labels you wish to use dynamically (e.g., cluster, workload) or statically (e.g. team:infra) are defined in the query and monitor details.

    • Issue Summary (required):

      • Define a short title for issues that this Monitor will raise. It's useful to include variables that can be informative at first glance.

      • Example: Adding {{ labels.statusCode }} to the header will inject the status code to the name of the issue - this becomes especially useful when one Monitor raises multiple issues and you want to quickly understand their content without having to open each one.

        • “HTTP API Error {{ labels.status_code }}” -> HTTP API Error 500

        • “Workload {{ labels.workload }} Pod Restart” -> Workload frontend Pod Restart

      • Note: Autocomplete is supported to view what is usable in the Issue and will help ensure you put in the variables correctly.

    circle-exclamation

    The new format for templating variables is {{ variable }} or {{ labels.<label> }} , but the previous format {{ alert.labels.statusCode }} used for keep is still supported.

    • Description:

      • Used as the body of the message for the Issue.

      • The templating format is Jinja2, and can be used with variables similarly to the Issue Summary and various more advanced functions.

      • Example: Adding all the labels to be shown in the Slack message's body should be inserted into here using {{labels.<label>}} , you can add the severity {{severity}}, the monitor's name {{monitor_name}} and many more.

    circle-info

    Any pongo2 functionality like { %if } evaluations are supported, which can be used to render different descriptions for different conditions.

    • Advanced (Optional)

      • Display Labels (formally called 'context labels'):

        • These Labels will be displayed and filterable in Monitors>Issues page.

        • This list gets automatically populated based on the labels used in the aggregation function in the Query.

        • Note: You can remove labels from this list if you do not wish to see them in the Issues page.

    hashtag
    Section 4: Notifications

    Set up notifications behavior for issues from this monitor

    circle-info

    Workflows (used for Keep) and Notification Routes may work in parallel and do not affect each other.

    • Based on matching notification routes (Required)

      • The issues generated by this monitor will be evaluated by the Notification Route's scopes and rules and notifications will be sent accordingly

      • Note: The 'Preview' can be used in order to align expectations as for which notification routes may match this monitor's future issues

    • Routing (Workflows) (Optional)

      • Select Workflow:

        • Route alerts to existing workflows only, this means that other workflows will not process them. Use this to send alerts for a critical application such as Slack or PagerDuty.

    circle-exclamation

    Configure Routing for Keep Workflows only

    • Advanced (Optional)

      • Override Renotify Interval

        • Used to override the interval configured on the Notification Route for when a certain monitior's issue should send another notification at a different inerval.

        • Example: If it's set to 1m while the evaluation interval is 1m a notification will be sent with every firing evaluation. If it's set to 2d, even if the monitor evaluates every 1m, a notification will be sent once every 2 days.

        • Note: If the Issue stops firing and starts firing again, a new notification will be sent, this is not considered 'renotification'

        • Important: Minimum interval is the evaluation interval

    hashtag
    Using the Import option

    circle-exclamation

    This is an advanced feature, please use it with caution.

    In the "Import Bulk Monitors" you can add multiple monitors using an array of Monitors that follows the Monitor YAML structure.

    Example of importing multiple monitors

    Click on "Create Monitors" to create them.

    https://github.com/groundcover-com/docs/blob/main/use-groundcover/monitors/README.md
    groundcover Terraform Provider

    Query Metrics

    Execute PromQL queries against groundcover metrics data. Two endpoints are available: instant queries for point-in-time values and range queries for time-series data over specific periods.

    hashtag
    Endpoints

    hashtag
    Instant Query

    monitors:
    - title: K8s Cluster High Memory Requests Monitor
      display:
        header: K8s Cluster High Memory Requests
        description: Alerts when a K8s Cluster's total Container Memory Requests exceeds 90% of the Allocatable Memory of all the Nodes for 5 minutes    
        contextHeaderLabels:
          - env
          - cluster
      severity: S1
      measurementType: state
      model:
        queries:
          - name: threshold_input_query
            expression: avg_over_time( (((sum(groundcover_node_rt_mem_requests_bytes{}) by (cluster, env)) / (sum(groundcover_node_rt_allocatable_mem_bytes{}) by (cluster, env))) * 100)[5m] )
            queryType: instant
            datasourceType: prometheus
        thresholds:
          - name: threshold_1
            inputName: threshold_input_query
            operator: gt
            values:
              - 90
      noDataState: OK
      evaluationInterval:
        interval: 1m
        pendingFor: 0s
    - title: K8s PVC Pending For 5 Minutes Monitor
      display:
        header: K8s PVC Pending Over 5 Minutes
        description: This monitor triggers an alert when a PVC remains in a Pending state for more than 5 minutes.
        contextHeaderLabels:
          - cluster
          - namespace
          - persistentvolumeclaim
      severity: S2
      measurementType: state
      model:
        queries:
          - name: threshold_input_query
            expression: last_over_time(max(groundcover_kube_persistentvolumeclaim_status_phase{phase="Pending"}) by (cluster, namespace, persistentvolumeclaim)[1m])
            queryType: instant
            datasourceType: prometheus
        thresholds:
          - name: threshold_1
            inputName: threshold_input_query
            operator: gt
            values:
              - 0
      executionErrorState: OK
      noDataState: OK
      evaluationInterval:
        interval: 1m
        pendingFor: 5m
    Lower Than - Trigger when the value falls below X.
  • Within Range - Trigger when the value is between X and Y.

  • Outside Range - Trigger when the value is not between X and Y.

  • Important: The units in which the threshold is being measured in must be the same as the units the query uses.

    • For metrics queries the threshold should match the unit the metric is measured in.

    • For logs, traces, and events, it's just a number.

  • Example: “Trigger if disk space usage is greater than 10%.”

  • Choose the rollup to present.
  • Important: These configurations only affect the preview graph, not the monitor's evaluation.

  • Example: “Evaluate every 1 minute.”
  • Pending Period:

    • Specify how many times the evaluation needs to pass the threshold in order to trigger an Issue. This refers to a consecutive evaluations passing the threshold.

    • Monitors that have entered the pending period (the first evaluation passed the threshold) will be in 'Pending' state, only after all consecutive evaluations passed the threshold, the monitor will be 'Firing' and an Issue will be created. If even 1 of the evaluations did not pass the threshold, the Monitor will be set right back to 'Normal'.

    • Example: “When Evaluation Interval of 5m, setting this to 2 (10m) ensures the condition must be evaluated 3 times before a monitor will fire.

      • Evaluation #1 at 0m

      • Evaluation #2 at 5m

      • Evaluation #3 at 10m -> If all 3 passed the threshold, an the monitor will 'Fire'

    • Note: This ensures that transient conditions do not trigger alerts, reducing false positives or smoothing sudden unwanted spikes.

    • Important: The default configuration is 0, which means the monitor will trigger an Issue immediately when an evaluation was run and the threshold was passed.

  • Treat No Data As:

    • Whether the monitor should treat no data as an Issue or Normal behavior

    • Example: "I want to be notified if the metric has a gap for the entire look-behind window of the query, so I will set it to 'Firing'"

  • “{{ labels.team }} APIs High Latency” -> Infra APIs High Latency
    No Routing:
    • This means that any workflow (without filters), will process the issue.

    Monitor Wizard
    Monitor Catalog
    Import
    GET /api/prometheus/api/v1/query

    Execute an instant PromQL query to get metric values at a single point in time.

    hashtag
    Range Query

    POST /api/metrics/query-range

    Execute a PromQL query over a time range to get time-series data.

    hashtag
    Authentication

    Both endpoints require API Key authentication via the Authorization header.

    hashtag
    Instant Query Endpoint

    hashtag
    Request

    GET /api/prometheus/api/v1/query

    Headers

    Query Parameters

    Parameter
    Type
    Required
    Description

    query

    string

    Yes

    PromQL query string (URL encoded)

    time

    string

    No

    Single point in time to evaluate the query (RFC3339 format). Default: current time

    Understanding the Time Parameter

    The time parameter specifies exactly one timestamp at which to evaluate your PromQL expression. This is NOT a time range:

    • With time: "What was the disk usage at 2025-10-21T09:21:44.398Z?"

    • Without time: "What is the disk usage right now?"

    Important: This is different from range queries which return time-series data over a period.

    Instant vs Range Queries - Key Differences

    Aspect
    Instant Query
    Range Query

    Purpose

    Get value at one specific moment

    Get time-series data over a period

    Time Parameter

    Single timestamp (time)

    Start and end timestamps (start, end)

    Result

    One data point

    Multiple data points over time

    Use Case

    "What is the current CPU usage?"

    Example Comparison:

    • Instant: time=2025-10-21T09:00:00Z → Returns disk usage at exactly 9:00 AM

    • Range: start=2025-10-21T08:00:00Z&end=2025-10-21T09:00:00Z → Returns hourly disk usage trend

    Practical Example:

    hashtag
    Response

    The endpoint returns a Prometheus-compatible response format:

    Response Fields

    Field
    Type
    Description

    status

    string

    Query execution status ("success" or "error")

    data.resultType

    string

    Type of result data ("vector", "matrix", "scalar", "string")

    data.result

    array

    Array of metric results

    data.result[].metric

    object

    Metric labels as key-value pairs

    hashtag
    Range Query Endpoint

    hashtag
    Request

    POST /api/metrics/query-range

    Headers

    Request Body

    Request Parameters

    Parameter
    Type
    Required
    Description

    promql

    string

    Yes

    PromQL query expression

    start

    string

    Yes

    Range start time in RFC3339 format

    end

    string

    Yes

    hashtag
    Response

    The range query returns a custom format optimized for time-series data:

    Response Fields

    Field
    Type
    Description

    velocities

    array

    Array of time-series data objects

    velocities[].velocity

    array

    Array of [timestamp, value] data points

    velocities[].metric

    object

    Metric labels as key-value pairs

    promql

    string

    Echo of the executed PromQL query

    Each data point in the velocity array contains:

    • Timestamp: Unix timestamp as integer

    • Value: Metric value as string

    hashtag
    Examples

    hashtag
    Instant Query Examples

    Current Values (No Time Parameter)

    Get current average disk space usage (evaluated at request time):

    Historical Point-in-Time Values

    Get disk usage at a specific moment in the past:

    Note: This returns the disk usage value at exactly 2025-10-21T09:21:44.398Z, not a range from that time until now.

    hashtag
    Range Query Examples

    24-Hour Disk Usage Trend

    Get disk usage over the last 24 hours with 30-minute resolution:

    High-Resolution CPU Monitoring

    Monitor CPU usage over 1 hour with 1-minute resolution:

    hashtag
    Query Optimization

    1. Use appropriate time ranges: Avoid querying excessively large time ranges

    2. Choose optimal step sizes: Balance resolution with performance

    3. Filter early: Use label filters to reduce data volume

    4. Aggregate wisely: Use grouping to reduce cardinality

    hashtag
    Time Handling

    1. Use RFC3339 format: Always use ISO 8601 timestamps

    2. Account for timezones: Timestamps are in UTC

    3. Align step boundaries: Choose steps that align with data collection intervals

    4. Handle clock skew: Allow for small time differences in distributed systems

    hashtag
    Rate Limiting

    • Concurrent queries: Limit concurrent requests to avoid overwhelming the API

    • Query complexity: Complex queries may take longer and consume more resources

    • Data retention: Historical data availability depends on your retention policy

    Authorization: Bearer <YOUR_API_KEY>
    Accept: application/json
    # Get current value (no time parameter)
    # Returns: {"value":[1761040224,"18.45"]} - timestamp is "right now"
    curl '...query=avg(groundcover_node_rt_disk_space_used_percent{})' 
    
    # Get historical value (with time parameter) 
    # Returns: {"value":[1761038504,"18.44"]} - timestamp is exactly what you specified
    curl '...query=avg(groundcover_node_rt_disk_space_used_percent{})&time=2025-10-21T09:21:44.398Z'
    {
      "status": "success",
      "data": {
        "resultType": "vector",
        "result": [
          {
            "metric": {},
            "value": [1761038504.398, "18.442597642017"]
          }
        ]
      },
      "stats": {
        "seriesFetched": "12",
        "executionTimeMsec": 0
      }
    }
    Authorization: Bearer <YOUR_API_KEY>
    Accept: application/json
    Content-Type: application/json
    {
      "promql": "string",
      "start": "string",
      "end": "string", 
      "step": "string"
    }
    {
      "velocities": [
        {
          "velocity": [
            [1760950800, "21.534558665381155"],
            [1760952600, "21.532404350483848"],
            [1760954400, "21.57135294176692"]
          ],
          "metric": {}
        }
      ],
      "promql": "avg(groundcover_node_rt_disk_space_used_percent{})"
    }
    curl 'https://app.groundcover.com/api/prometheus/api/v1/query?query=avg%28groundcover_node_rt_disk_space_used_percent%7B%7D%29' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>'
    curl 'https://app.groundcover.com/api/prometheus/api/v1/query?query=avg%28groundcover_node_rt_disk_space_used_percent%7B%7D%29&time=2025-10-21T09%3A21%3A44.398Z' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>'
    curl 'https://app.groundcover.com/api/metrics/query-range' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'content-type: application/json' \
      --data-raw '{
        "promql": "avg(groundcover_node_rt_disk_space_used_percent{})",
        "start": "2025-10-20T09:19:22.475Z",
        "end": "2025-10-21T09:19:22.475Z",
        "step": "1800s"
      }'
    curl 'https://app.groundcover.com/api/metrics/query-range' \
      -H 'accept: application/json' \
      -H 'authorization: Bearer <YOUR_API_KEY>' \
      -H 'content-type: application/json' \
      --data-raw '{
        "promql": "avg(groundcover_cpu_usage_percent) by (cluster)",
        "start": "2025-10-21T08:00:00.000Z",
        "end": "2025-10-21T09:00:00.000Z", 
        "step": "1m"
      }'

    "Show me CPU usage over the last hour"

    Response

    Single value with timestamp

    Array of timestamp-value pairs

    data.result[].value

    array

    [timestamp, value] tuple with Unix timestamp and string value

    stats.seriesFetched

    string

    Number of time series processed

    stats.executionTimeMsec

    number

    Query execution time in milliseconds

    Range end time in RFC3339 format

    step

    string

    Yes

    Query resolution step (e.g., "30s", "1m", "5m", "1h")

    Monitor YAML structure

    While we strongly suggest building monitors using our Wizard or Catalog, groundcover supports building and editing your Monitors using YAML. If you choose to do so, the following will provide you the necessary definitions.

    hashtag
    Monitor fields explained

    In this section, you'll find a breakdown of the key fields used to define and configure Monitors within the groundcover platform. Each field plays a critical role in how a Monitor behaves, what data it tracks, and how it responds to specific conditions. Understanding these fields will help you set up effective Monitors to track performance, detect issues, and provide timely alerts.

    Below is a detailed explanation of each field, along with examples to illustrate their usage, ensuring your team can manage and respond to incidents efficiently.

    Field
    Explanation
    Example

    hashtag
    Monitor YAML Examples

    hashtag
    Traces Based Monitors

    hashtag
    MySQL Query Errors Monitor

    hashtag
    gRPC API Errors Monitor

    hashtag
    Log Based Monitors

    hashtag
    High Error Log Rate Monitor

    Query Monitors Summary

    Get a comprehensive list of monitor configurations with detailed execution status, alert states, performance metrics, and complete query definitions. This endpoint provides real-time monitoring data f

    hashtag
    Endpoint

    POST /api/monitors/summary/query

    ResourceHeaderLabels

    A list of labels that help you identify the resources that are related to the Monitor. This appear as a secondary header in all Issues tables across the platform.

    ["span_name", "kind"] for monitors on protocol issues.

    ContextHeaderLabels

    A list of contextual labels that help you identify the location of the issue. This appears as a subset of the Issue’s labels, and is displayed on all Issues tables across the platform.

    ["cluster", "namespace", "pod_name"]

    Labels

    A set of pre-defined labels that were set to Issues related to the selected Monitor. Labels can be static, or dynamic using a Monitor's query results.

    team: sre_team

    ExecutionErrorState

    Defines the actions that take place when a Monitor encounters query execution errors.

    Valid options are Alerting, OK and Error.

    • When Alerting is set, query execution errors will result in a firing issue.

    NoDataState

    This defines what happens when queries in the Monitor return empty datasets.

    Valid options are: NoData , Alerting, OK

    • When NoData is set, monitor instance's state will be No Data.

    Interval

    Defines how frequently the Monitor evaluates the conditions. Common intervals could be 1m, 5m, etc.

    PendingFor

    Defines the period of consecutive intervals where threshold condition must be met to trigger the alert.

    Trigger

    Defines the condition under which the Monitor fires. This is the definition of threshold for the Monitor, with op - operator and value .

    op: gt, value: 5

    Model

    Describes the queries, thresholds and data processing of the Monitor. It can have the following fields:

    • Queries: List of one or more queries to run, this can be either SQL over ClickHouse, PromQL over VictoriaMetrics, SqlPipeline. Each query will have a name for reference in the monitor.

    • Thresholds: This is the threshold of your Monitor, a threshold has a name, inputName for data input, operator one of gt , lt ,

    measurementType

    Describe how will we present issues of this Monitor. Some Monitors count events, and some a state. And we will display them differently in our dashboards.

    • state - Will present issues in line chart.

    • event - Will present issues in bar chart, counting events.

    Title

    A string that defines the human-readable name of the Monitor. The title is what you will see in the list of all existing Monitors in the Monitors section.

    Description

    Additional information about the Monitor.

    Severity

    When triggered, this will show the severity level of the Monitor's issue. You can set any severity you want here.

    s1 for Critical

    s2 for High

    s3 for Medium

    s4 for Low

    Header

    This is the header of the generated issues from the Monitor.

    A short string describing the condition that is being monitored. You can also use this as a pattern using labels from you query.

    “HTTP API Error {{ alert.labels.return_code}}”

    hashtag
    Authentication

    This endpoint requires API Key authentication via the Authorization header.

    hashtag
    Headers

    hashtag
    Request Body

    The request body supports filtering, pagination, sorting, and time range parameters:

    hashtag
    Request Parameters

    Parameter
    Type
    Required
    Description

    conditions

    array

    No

    Array of filter conditions for monitors (empty array returns all)

    limit

    integer

    No

    Maximum number of monitors to return (default: 200)

    skip

    integer

    No

    hashtag
    Sorting Options

    Sort Field
    Description

    "lastFiringStart"

    Last time monitor started firing alerts

    "title"

    Monitor title alphabetically

    "severity"

    Monitor severity level

    "createdAt"

    Monitor creation date

    "updatedAt"

    Last modification date

    "state"

    Current monitor state

    hashtag
    Filter Conditions

    The conditions array accepts filter objects for targeted monitor queries:

    hashtag
    Response

    The endpoint returns a JSON object containing an array of detailed monitor configurations:

    hashtag
    Response Fields

    Top-Level Fields

    Field
    Type
    Description

    hasMonitors

    boolean

    Whether any monitors exist in the system

    monitors

    array

    Array of monitor configuration objects

    Monitor Object Fields

    Field
    Type
    Description

    uuid

    string

    Unique monitor identifier

    title

    string

    Monitor display name

    description

    string

    Monitor description

    severity

    string

    Alert severity level ("S1", "S2", "S3", "S4")

    hashtag
    Examples

    hashtag
    Basic Request

    Get all monitors with default pagination:

    hashtag
    Filter by Time Range

    Get monitors within a specific time window:

    hashtag
    Pagination Example

    Get the second page of results:

    hashtag
    Sort by Creation Date

    Get monitors sorted by newest first:

    hashtag
    Response Example

    title: MySQL Query Errors Monitor
    display:
      header: MySQL Error {{ alert.labels.statusCode }}
      description: This monitor detects MySQL Query errors.
      resourceHeaderLabels:
        - span_name
        - role
      contextHeaderLabels:
        - cluster
        - namespace
        - workload
    severity: S3
    measurementType: event
    model:
      queries:
        - name: threshold_input_query
          dataType: traces
          sqlPipeline:
            selectors:
              - key: _time
                origin: root
                type: string
                processors:
                  - op: toStartOfInterval
                    args:
                      - 1 minutes
                alias: bucket_timestamp
              - key: statusCode
                origin: root
                type: string
                alias: statusCode
              - key: span_name
                origin: root
                type: string
                alias: span_name
              - key: cluster
                origin: root
                type: string
                alias: cluster
              - key: namespace
                origin: root
                type: string
                alias: namespace
              - key: role
                origin: root
                type: string
                alias: role
              - key: workload
                origin: root
                type: string
                alias: workload
              - key: "*"
                origin: root
                type: string
                processors:
                  - op: count
                alias: logs_total
            groupBy:
              - key: _time
                origin: root
                type: string
                processors:
                  - op: toStartOfInterval
                    args:
                      - 1 minutes
              - key: statusCode
                origin: root
                type: string
                alias: statusCode
              - key: span_name
                origin: root
                type: string
                alias: span_name
              - key: cluster
                origin: root
                type: string
                alias: cluster
              - key: namespace
                origin: root
                type: string
                alias: namespace
              - key: role
                origin: root
                type: string
                alias: role
              - key: workload
                origin: root
                type: string
                alias: workload
            orderBy:
              - selector:
                  key: bucket_timestamp
                  origin: root
                  type: string
                direction: ASC
            limit:
            filters:
              operator: and
              conditions:
                - filters:
                    - op: match
                      value: mysql
                  key: eventType
                  origin: root
                  type: string
                - filters:
                    - op: match
                      value: error
                  key: status
                  origin: root
                  type: string
                - filters:
                    - op: match
                      value: eBPF
                  key: source
                  origin: root
                  type: string
          instantRollup: 1 minutes
      thresholds:
        - name: threshold_1
          inputName: threshold_input_query
          operator: gt
          values:
            - 0
    executionErrorState: OK
    noDataState: OK
    evaluationInterval:
      interval: 1m
      pendingFor: 0s
    labels:
      team: infra
    title: gRPC API Errors Monitor
    display:
      header: gRPC API Error {{ alert.labels.statusCode }}
      description: This monitor detects gRPC API errors by identifying responses with a non-zero status code.
      resourceHeaderLabels:
        - span_name
        - role
      contextHeaderLabels:
        - cluster
        - namespace
        - workload
    severity: S3
    measurementType: event
    model:
      queries:
        - name: threshold_input_query
          dataType: traces
          sqlPipeline:
            selectors:
              - key: _time
                origin: root
                type: string
                processors:
                  - op: toStartOfInterval
                    args:
                      - 1 minutes
                alias: bucket_timestamp
              - key: statusCode
                origin: root
                type: string
                alias: statusCode
              - key: span_name
                origin: root
                type: string
                alias: span_name
              - key: cluster
                origin: root
                type: string
                alias: cluster
              - key: namespace
                origin: root
                type: string
                alias: namespace
              - key: role
                origin: root
                type: string
                alias: role
              - key: workload
                origin: root
                type: string
                alias: workload
              - key: "*"
                origin: root
                type: string
                processors:
                  - op: count
                alias: logs_total
            groupBy:
              - key: _time
                origin: root
                type: string
                processors:
                  - op: toStartOfInterval
                    args:
                      - 1 minutes
              - key: statusCode
                origin: root
                type: string
                alias: statusCode
              - key: span_name
                origin: root
                type: string
                alias: span_name
              - key: cluster
                origin: root
                type: string
                alias: cluster
              - key: namespace
                origin: root
                type: string
                alias: namespace
              - key: role
                origin: root
                type: string
                alias: role
              - key: workload
                origin: root
                type: string
                alias: workload
            orderBy:
              - selector:
                  key: bucket_timestamp
                  origin: root
                  type: string
                direction: ASC
            limit:
            filters:
              operator: and
              conditions:
                - filters:
                    - op: match
                      value: grpc
                  key: eventType
                  origin: root
                  type: string
                - filters:
                    - op: ne
                      value: "0"
                  key: statusCode
                  origin: root
                  type: string
                - filters:
                    - op: match
                      value: error
                  key: status
                  origin: root
                  type: string
                - filters:
                    - op: match
                      value: eBPF
                  key: source
                  origin: root
                  type: string
          instantRollup: 1 minutes
      thresholds:
        - name: threshold_1
          inputName: threshold_input_query
          operator: gt
          values:
            - 0
    executionErrorState: OK
    noDataState: OK
    evaluationInterval:
      interval: 1m
      pendingFor: 0s
    title: High Error Log Rate Monitor
    severity: S4
    display:
      header: High Log Error Rate
      description: This monitor will trigger an alert when we have a rate of error logs.
      resourceHeaderLabels:
        - workload
      contextHeaderLabels:
        - cluster
        - namespace
    evaluationInterval:
      interval: 1m
      pendingFor: 0s
    model:
      queries:
        - name: threshold_input_query
          dataType: logs
          sqlPipeline:
            selectors:
              - key: _time
                origin: root
                type: string
                processors:
                  - op: toStartOfInterval
                    args:
                      - 1 minutes
                alias: bucket_timestamp
              - key: workload
                origin: root
                type: string
                alias: workload
              - key: namespace
                origin: root
                type: string
                alias: namespace
              - key: cluster
                origin: root
                type: string
                alias: cluster
              - key: "*"
                origin: root
                type: string
                processors:
                  - op: count
                alias: logs_total
            groupBy:
              - key: _time
                origin: root
                type: string
                processors:
                  - op: toStartOfInterval
                    args:
                      - 1 minutes
              - key: workload
                origin: root
                type: string
                alias: workload
              - key: namespace
                origin: root
                type: string
                alias: namespace
              - key: cluster
                origin: root
                type: string
                alias: cluster
            orderBy:
              - selector:
                  key: bucket_timestamp
                  origin: root
                  type: string
                direction: ASC
            limit:
            filters:
              conditions:
                - filters:
                    - op: match
                      value: error
                  key: level
                  origin: root
                  type: string
              operator: and
          instantRollup: 1 minutes
      thresholds:
        - name: threshold_1
          inputName: threshold_input_query
          operator: gt
          values:
            - 150
    noDataState: OK
    measurementType: event
    Authorization: Bearer <YOUR_API_KEY>
    Content-Type: application/json
    Accept: text/event-stream
    {
      "conditions": [],
      "limit": 200,
      "skip": 0,
      "maxInstances": 10,
      "order": "desc",
      "sortBy": "lastFiringStart",
      "start": "2025-10-12T08:19:18.582Z",
      "end": "2025-10-12T09:19:18.582Z"
    }
    {
      "conditions": [
        {
          "field": "severity",
          "operator": "equals",
          "value": "S1"
        },
        {
          "field": "state",
          "operator": "in",
          "values": ["Alerting", "Normal"]
        }
      ]
    }
    {
      "hasMonitors": true,
      "monitors": [
        {
          "uuid": "string",
          "title": "string",
          "description": "string",
          "severity": "string",
          "measurementType": "string",
          "state": "string",
          "alertingCount": 0,
          "model": {
            "queries": [],
            "thresholds": []
          },
          "interval": {
            "interval": "string",
            "for": "string"
          },
          "executionErrorState": "string",
          "noDataState": "string",
          "isPaused": false,
          "createdBy": 0,
          "createdByEmail": "string",
          "createdAt": "string",
          "updatedAt": "string",
          "lastStateStart": "string",
          "lastFiringStart": "string",
          "firstFiringStart": "string",
          "lastResolved": "string",
          "minEvaluationDurationSeconds": 0.0,
          "avgEvaluationDurationSeconds": 0.0,
          "maxEvaluationDurationSeconds": 0.0,
          "lastEvaluationError": "string",
          "lastEvaluationTimestamp": "string",
          "silenced": false,
          "fullySilenced": false,
          "silence_uuids": []
        }
      ]
    }
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/monitors/summary/query' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --header 'Accept: text/event-stream' \
      --data-raw '{
        "conditions": [],
        "limit": 200,
        "skip": 0,
        "maxInstances": 10,
        "order": "desc",
        "sortBy": "lastFiringStart"
      }'
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/monitors/summary/query' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --header 'Accept: text/event-stream' \
      --data-raw '{
        "conditions": [],
        "limit": 100,
        "skip": 0,
        "maxInstances": 10,
        "order": "desc",
        "sortBy": "lastFiringStart",
        "start": "2025-10-12T08:00:00.000Z",
        "end": "2025-10-12T10:00:00.000Z"
      }'
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/monitors/summary/query' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --header 'Accept: text/event-stream' \
      --data-raw '{
        "conditions": [],
        "limit": 50,
        "skip": 50,
        "maxInstances": 10,
        "order": "desc",
        "sortBy": "title"
      }'
    curl -L \
      --request POST \
      --url 'https://api.groundcover.com/api/monitors/summary/query' \
      --header 'Authorization: Bearer <YOUR_API_KEY>' \
      --header 'Content-Type: application/json' \
      --header 'Accept: text/event-stream' \
      --data-raw '{
        "conditions": [],
        "limit": 100,
        "skip": 0,
        "maxInstances": 10,
        "order": "desc",
        "sortBy": "createdAt"
      }'
    {
      "hasMonitors": true,
      "monitors": [
        {
          "uuid": "12345678-1234-1234-1234-123456789abc",
          "title": "Example_Latency_Monitor",
          "description": "",
          "template": "Example_Latency_Monitor",
          "severity": "S2",
          "measurementType": "event",
          "header": "Example_Latency_Monitor",
          "resourceLabels": ["workload"],
          "contextLabels": ["namespace", "cluster"],
          "category": "",
          "interval": {
            "interval": "5m0s",
            "for": "1m0s"
          },
          "model": {
            "queries": [
              {
                "dataType": "traces",
                "name": "threshold_input_query",
                "sqlPipeline": {
                  "selectors": [
                    {
                      "key": "workload",
                      "origin": "root",
                      "type": "string",
                      "alias": "workload"
                    },
                    {
                      "key": "namespace",
                      "origin": "root",
                      "type": "string",
                      "alias": "namespace"
                    },
                    {
                      "key": "cluster",
                      "origin": "root",
                      "type": "string",
                      "alias": "cluster"
                    }
                  ]
                },
                "instantRollup": "5 minutes"
              }
            ],
            "reducers": null,
            "thresholds": [
              {
                "name": "threshold_1",
                "inputName": "threshold_input_query",
                "operator": "gt",
                "values": [502]
              }
            ],
            "query": "SELECT workload, namespace, cluster, count(*) AS logs_total FROM traces WHERE (start_timestamp < toStartOfInterval(NOW(), INTERVAL '5 MINUTE') AND start_timestamp >= (toStartOfInterval(NOW(), INTERVAL '5 MINUTE') - INTERVAL '5 minutes')) GROUP BY workload, namespace, cluster",
            "type": "traces"
          },
          "reducer": "",
          "trigger": {
            "op": "gt",
            "value": 502
          },
          "labelsMapping": {
            "owner": "example-user"
          },
          "executionErrorState": "",
          "noDataState": "OK",
          "isPaused": false,
          "createdBy": 12345,
          "createdByEmail": "[email protected]",
          "createdAt": "2025-03-14T20:42:36.949847Z",
          "updatedAt": "2025-09-21T12:17:00.130801Z",
          "relativeTimerange": {},
          "silences": [],
          "monitorId": "12345678-1234-1234-1234-123456789abc",
          "state": "Alerting",
          "lastStateStart": "0001-01-01T00:00:00Z",
          "lastFiringStart": null,
          "firstFiringStart": null,
          "lastResolved": null,
          "alertingCount": 11,
          "silenced": false,
          "fullySilenced": false,
          "silence_uuids": [],
          "minEvaluationDurationSeconds": 7.107210216,
          "avgEvaluationDurationSeconds": 7.1096896183047775,
          "maxEvaluationDurationSeconds": 7.119120884,
          "lastEvaluationError": "",
          "lastEvaluationTimestamp": "2025-10-12T09:15:50Z"
        }
      ]
    }
    When Error is set, query execution errors will result in an error state.
  • When OK is set, query execution errors will do neither of the above. This is the default setting

  • When Alerting is set, monitor instance's state will be Pending and then will change to Alerting once the pending period of the monitor ends.

  • When OK is set, monitor instance's state will be Normal. This is the default setting.

  • within_range
    ,
    outside_range
    and array of values which are the threshold values.

    Number of monitors to skip for pagination (default: 0)

    maxInstances

    integer

    No

    Maximum instances per monitor result (default: 10)

    order

    string

    No

    Sort order: "asc" or "desc" (default: "desc")

    sortBy

    string

    No

    Field to sort by (see sorting options below)

    start

    string

    No

    Start time for filtering (ISO 8601 format)

    end

    string

    No

    End time for filtering (ISO 8601 format)

    measurementType

    string

    Monitor type ("state", "event")

    state

    string

    Current monitor state ("Normal", "Alerting", "Paused")

    alertingCount

    integer

    Number of active alerts

    model

    object

    Monitor configuration with queries and thresholds

    interval

    object

    Evaluation timing configuration

    executionErrorState

    string

    State when execution fails

    noDataState

    string

    State when no data is available

    isPaused

    boolean

    Whether monitor is currently paused

    createdBy

    integer

    Creator user ID

    createdByEmail

    string

    Creator email address

    createdAt

    string

    Creation timestamp (ISO 8601)

    updatedAt

    string

    Last update timestamp (ISO 8601)

    lastStateStart

    string

    When current state began

    lastFiringStart

    string

    When monitor last started alerting

    firstFiringStart

    string

    When monitor first started alerting

    lastResolved

    string

    When monitor was last resolved

    minEvaluationDurationSeconds

    float

    Fastest query execution time

    avgEvaluationDurationSeconds

    float

    Average query execution time

    maxEvaluationDurationSeconds

    float

    Slowest query execution time

    lastEvaluationError

    string

    Last execution error message

    lastEvaluationTimestamp

    string

    Last evaluation timestamp

    silenced

    boolean

    Whether monitor is silenced

    fullySilenced

    boolean

    Whether monitor is completely silenced

    silence_uuids

    array

    Array of silence rule identifiers

    Metrics & Labels

    hashtag
    Kubernetes Infrastructure Metrics & Labels

    hashtag
    Node CPU, Memory and Disk

    hashtag
    Labels

    type clusterId region node_name

    hashtag
    Metrics

    Name
    Description
    Unit
    Type

    hashtag
    Storage Usage

    hashtag
    Labels

    type clusterId region name namespace

    hashtag
    Metrics

    Name
    Description
    Unit
    Type

    hashtag
    Network Usage

    hashtag
    Labels

    clusterId workload_name namespace container_name remote_service_name remote_namespace remote_is_external availability_zone region remote_availability_zone remote_region

    Notes:

    • is_loopback and remote_is_external are special labels that indicate the remote service is either the same service as the recording side (loopback) or resides in an external network, e.g managed service outside of the cluster (external).

      • In both cases the remote_service_name and the remote_namespace labels will be empty

    hashtag
    Metrics

    Name
    Description
    Unit
    Type

    hashtag
    Kubernetes Resources

    hashtag
    Labels

    type resource condition status clusterId region namespace workload_name deployment unit

    hashtag
    Metrics

    Name
    Description
    Unit
    Type

    hashtag
    Container Metrics & Labels

    hashtag
    Container CPU

    hashtag
    Labels

    type clusterId region namespace node_name workload_name pod_name container_name container_image

    hashtag
    Metrics

    Name
    Description
    Unit
    Type

    hashtag
    Container Memory

    hashtag
    Labels

    type clusterId region namespace node_name workload_name pod_name container_name container_image

    hashtag
    Metrics

    Name
    Description
    Unit
    Type

    hashtag
    Container I/O

    hashtag
    Labels

    type clusterId region namespace node_name workload_name pod_name container_name container_image

    hashtag
    Metrics

    Name
    Description
    Unit
    Type

    hashtag
    Container Network

    hashtag
    Labels

    type clusterId region namespace node_name workload_name pod_name container_name container_image

    hashtag
    Metrics

    Name
    Description
    Unit
    Type

    hashtag
    Container Status

    hashtag
    Labels

    type clusterId region namespace node_name workload_name pod_name container_name container_image

    hashtag
    Metrics

    Name
    Description
    Unit
    Type

    hashtag
    Host Resources Metrics & Labels

    hashtag
    Host CPU

    hashtag
    Labels

    clusterId env region host_name cloud_provider env_type

    hashtag
    Metrics

    Name
    Description
    Unit
    Type

    hashtag
    Host Memory

    hashtag
    Labels

    clusterId env region host_name cloud_provider env_type

    hashtag
    Metrics

    Name
    Description
    Unit
    Type

    hashtag
    Host Disk

    hashtag
    Labels

    clusterId env region host_name cloud_provider env_type Optional: device_name

    hashtag
    Metrics

    Name
    Description
    Unit
    Type

    hashtag
    Host I/O

    hashtag
    Labels

    clusterId env region host_name cloud_provider env_type Optional: device_name

    hashtag
    Metrics

    Name
    Description
    Unit
    Type

    hashtag
    Host Filesystem

    hashtag
    Labels

    clusterId env region host_name cloud_provider env_type device_name file_system mountpoint

    hashtag
    Metrics

    Name
    Description
    Unit
    Type

    hashtag
    Host File Handles

    hashtag
    Labels

    clusterId env region host_name cloud_provider env_type

    hashtag
    Metrics

    Name
    Description
    Unit
    Type

    hashtag
    Host Network

    hashtag
    Labels

    clusterId env region host_name cloud_provider env_type device

    hashtag
    Metrics

    Name
    Description
    Unit
    Type

    hashtag
    Application Metrics & Labels

    Label name
    Description
    Relevant types
    circle-info

    Summary based metrics have an additional quantile label, representing the percentile. Available values: [”0.5”, “0.95”, 0.99”].

    circle-exclamation

    We also use a set of internal labels which are not relevant in most use-cases. Find them interesting?

    issue_id entity_id resource_id query_id aggregation_id

    hashtag
    Golden Signals (Errors & Issues)

    In the lists below, we describe error and issue counters. Every issue flagged by the platform is an error; but not every error is flagged as an issue.

    hashtag
    Resource metrics

    Name
    Description
    Unit
    Type

    hashtag
    Workload metrics

    Name
    Description
    Unit
    Type

    hashtag
    Kafka specific metrics

    Name
    Description
    Unit
    Type

    groundcover_node_used_disk_space

    Current used disk space in the current node

    Bytes

    Gauge

    groundcover_node_free_disk_space

    Free disk space in the current node

    Bytes

    Gauge

    groundcover_node_total_disk_space

    Total disk space in the current node

    Bytes

    Gauge

    groundcover_node_used_percent_disk_space

    Percentage of used disk space in the current node

    Percentage

    Gauge

    groundcover_pvc_usage_percent

    Percentage of used Persistent Volume Claim (PVC) storage

    Percentage

    Gauge

    groundcover_pvc_read_bytes_total

    Total bytes read by the workload from the Persistent Volume Claim (PVC)

    Bytes

    Counter

    groundcover_pvc_write_bytes_total

    Total bytes written by the workload to the Persistent Volume Claim (PVC)

    Bytes

    Counter

    groundcover_pvc_reads_total

    Total read operations performed by the workload from the Persistent Volume Claim (PVC)

    Number

    Counter

    groundcover_pvc_writes_total

    Total write operations performed by the workload to the Persistent Volume Claim (PVC)

    Number

    Counter

    groundcover_pvc_read_latency

    Latency of read operations from the Persistent Volume Claim (PVC) by the workload

    Seconds

    Summary

    groundcover_pvc_write_latency

    Latency of write operations to the Persistent Volume Claim (PVC) by the workload

    Seconds

    Summary

    groundcover_pvc_read_latency_count

    Count of read operations latency for the Persistent Volume Claim (PVC)

    Number

    Counter

    groundcover_pvc_read_latency_sum

    Sum of read operation latencies for the Persistent Volume Claim (PVC)

    Seconds

    Counter

    groundcover_pvc_read_latency_summary

    Summary of read operations latency for the Persistent Volume Claim (PVC)

    Milliseconds

    Counter

    groundcover_pvc_write_latency_count

    Count of write operations sampled for latency on the Persistent Volume Claim (PVC)

    Number

    Counter

    groundcover_pvc_write_latency_sum

    Sum of write operation latencies for the Persistent Volume Claim (PVC)

    Seconds

    Counter

    groundcover_pvc_write_latency_summary

    Summary of write operations latency for the Persistent Volume Claim (PVC)

    Milliseconds

    Counter
    is_cross_az
    protocol
    role
    server_port
    encryption
    transport_protocol
    is_loopback

    is_cross_az means the traffic was sent and/or received between two different availability zones. This is a helpful flag to quickly identify this special kind of communication.

    • The actual zones are detailed in the availability_zone and remote_availability_zone labels

    groundcover_network_connections_closed_total

    Total connections closed by the workload

    Number

    Counter

    groundcover_network_connections_opened_failed_total

    Total number of failed network connection attempts by the workload

    Number

    Counter

    groundcover_network_connections_refused_failed_total

    Connections attempts refused per workload

    Number

    Counter

    groundcover_network_connections_opened_refused_total

    Total number of network connections refused by the workload

    Number

    Counter

    groundcover_network_rx_ops_total

    Total number of read operations issued by the workload

    Number

    Counter

    groundcover_network_tx_ops_total

    Total number of write operations issued by the workload

    Number

    Counter

    groundcover_kube_daemonset_status_number_available

    Number of available Pods for the DaemonSet

    Number

    Gauge

    groundcover_kube_daemonset_status_number_misscheduled

    Number of Pods running on nodes they should not be scheduled on

    Number

    Gauge

    groundcover_kube_daemonset_status_number_ready

    Number of ready Pods for the DaemonSet

    Number

    Gauge

    groundcover_kube_daemonset_status_number_unavailable

    Number of unavailable Pods for the DaemonSet

    Number

    Gauge

    groundcover_kube_daemonset_status_observed_generation

    Most recent generation observed for the DaemonSet

    Number

    Gauge

    groundcover_kube_daemonset_status_updated_number_scheduled

    Number of Pods updated and scheduled by the DaemonSet

    Number

    Gauge

    groundcover_kube_deployment_created

    Creation timestamp of the Deployment

    Seconds

    Gauge

    groundcover_kube_deployment_metadata_generation

    Sequence number representing a specific generation of the Deployment

    Number

    Gauge

    groundcover_kube_deployment_spec_paused

    Whether the Deployment is paused

    Number

    Gauge

    groundcover_kube_deployment_spec_replicas

    Desired number of replicas for the Deployment

    Number

    Gauge

    groundcover_kube_deployment_spec_strategy_rollingupdate_max_unavailable

    Maximum number of unavailable Pods during a rolling update for the Deployment

    Number

    Gauge

    groundcover_kube_deployment_status_condition

    Current condition of the Deployment (labeled by type and status)

    Number

    Gauge

    groundcover_kube_deployment_status_observed_generation

    Most recent generation observed for the Deployment

    Number

    Gauge

    groundcover_kube_deployment_status_replicas

    Number of replicas for the Deployment

    Number

    Gauge

    groundcover_kube_deployment_status_replicas_available

    Number of available replicas for the Deployment

    Number

    Gauge

    groundcover_kube_deployment_status_replicas_ready

    Number of ready replicas for the Deployment

    Number

    Gauge

    groundcover_kube_deployment_status_replicas_unavailable

    Number of unavailable replicas for the Deployment

    Number

    Gauge

    groundcover_kube_deployment_status_replicas_updated

    Number of updated replicas for the Deployment

    Number

    Gauge

    groundcover_kube_horizontalpodautoscaler_spec_max_replicas

    Maximum number of replicas configured for the HPA

    Number

    Gauge

    groundcover_kube_horizontalpodautoscaler_spec_min_replicas

    Minimum number of replicas configured for the HPA

    Number

    Gauge

    groundcover_kube_horizontalpodautoscaler_spec_target_metric

    Configured HPA target metric value

    Number

    Gauge

    groundcover_kube_horizontalpodautoscaler_status_condition

    Current condition of the Horizontal Pod Autoscaler (labeled by type and status)

    Number

    Gauge

    groundcover_kube_horizontalpodautoscaler_status_current_replicas

    Current number of replicas managed by the HPA

    Number

    Gauge

    groundcover_kube_horizontalpodautoscaler_status_desired_replicas

    Desired number of replicas as calculated by the HPA

    Number

    Gauge

    groundcover_kube_horizontalpodautoscaler_status_target_metric

    Current observed value of the HPA target metric

    Number

    Gauge

    groundcover_kube_job_complete

    Whether the Job has completed successfully

    Number

    Gauge

    groundcover_kube_job_failed

    Whether the Job has failed

    Number

    Gauge

    groundcover_kube_job_spec_completions

    Desired number of successfully finished Pods for the Job

    Number

    Gauge

    groundcover_kube_job_spec_parallelism

    Desired number of Pods running in parallel for the Job

    Number

    Gauge

    groundcover_kube_job_status_active

    Number of actively running Pods for the Job

    Number

    Gauge

    groundcover_kube_job_status_completion_time

    Completion time of the Job as Unix timestamp

    Seconds

    Gauge

    groundcover_kube_job_status_failed

    Number of failed Pods for the Job

    Number

    Gauge

    groundcover_kube_job_status_start_time

    Start time of the Job as Unix timestamp

    Seconds

    Gauge

    groundcover_kube_job_status_succeeded

    Number of succeeded Pods for the Job

    Number

    Gauge

    groundcover_kube_node_created

    Creation timestamp of the Node

    Seconds

    Gauge

    groundcover_kube_node_spec_taint

    Node taint information (labeled by key, value and effect)

    Number

    Gauge

    groundcover_kube_node_spec_unschedulable

    Whether a node can schedule new pods

    Number

    Gauge

    groundcover_kube_node_status_allocatable

    The amount of resources allocatable for pods (after reserving some for system daemons)

    Number

    Gauge

    groundcover_kube_node_status_capacity

    The total amount of resources available for a node

    Number

    Gauge

    groundcover_kube_node_status_condition

    The condition of a cluster node

    Number

    Gauge

    groundcover_kube_persistentvolume_capacity_bytes

    Capacity of the PersistentVolume

    Bytes

    Gauge

    groundcover_kube_persistentvolume_status_phase

    Current phase of the PersistentVolume

    Number

    Gauge

    groundcover_kube_persistentvolumeclaim_access_mode

    Access mode of the PersistentVolumeClaim

    Number

    Gauge

    groundcover_kube_persistentvolumeclaim_status_phase

    Current phase of the PersistentVolumeClaim

    Number

    Gauge

    groundcover_kube_pod_container_resource_limits

    The number of requested limit resource by a container. It is recommended to use the `kube_pod_resource_limits` metric exposed by kube-scheduler instead, as it is more precise.

    Number

    Gauge

    groundcover_kube_pod_container_resource_requests

    The number of requested request resource by a container. It is recommended to use the `kube_pod_resource_requests` metric exposed by kube-scheduler instead, as it is more precise.

    Number

    Gauge

    groundcover_kube_pod_container_status_last_terminated_exitcode

    The last termination exit code for the container

    Number

    Gauge

    groundcover_kube_pod_container_status_last_terminated_reason

    The last termination reason for the container

    Number

    Gauge

    groundcover_kube_pod_container_status_ready

    Describes whether the containers readiness check succeeded

    Number

    Gauge

    groundcover_kube_pod_container_status_restarts_total

    The number of container restarts per container

    Number

    Counter

    groundcover_kube_pod_container_status_running

    Describes whether the container is currently in running state

    Number

    Gauge

    groundcover_kube_pod_container_status_terminated

    Describes whether the container is currently in terminated state

    Number

    Gauge

    groundcover_kube_pod_container_status_terminated_reason

    Describes the reason the container is currently in terminated state

    Number

    Gauge

    groundcover_kube_pod_container_status_waiting

    Describes whether the container is currently in waiting state

    Number

    Gauge

    groundcover_kube_pod_container_status_waiting_reason

    Describes the reason the container is currently in waiting state

    Number

    Gauge

    groundcover_kube_pod_created

    Creation timestamp of the Pod

    Seconds

    Gauge

    groundcover_kube_pod_init_container_resource_limits

    The number of CPU cores requested limit by an init container

    Bytes

    Gauge

    groundcover_kube_pod_init_container_resource_requests

    Requested resources by init container (labeled by resource and unit)

    Number

    Gauge

    groundcover_kube_pod_init_container_resource_requests_memory_bytes

    Requested memory by init containers

    Bytes

    Gauge

    groundcover_kube_pod_init_container_status_last_terminated_reason

    The last termination reason for the init container

    Number

    Gauge

    groundcover_kube_pod_init_container_status_ready

    Describes whether the init containers readiness check succeeded

    Number

    Gauge

    groundcover_kube_pod_init_container_status_restarts_total

    The number of restarts for the init container

    Number

    Gauge

    groundcover_kube_pod_init_container_status_running

    Describes whether the init container is currently in running state

    Number

    Gauge

    groundcover_kube_pod_init_container_status_terminated

    Describes whether the init container is currently in terminated state

    Number

    Gauge

    groundcover_kube_pod_init_container_status_terminated_reason

    Describes the reason the init container is currently in terminated state

    Number

    Gauge

    groundcover_kube_pod_init_container_status_waiting

    Describes whether the init container is currently in waiting state

    Number

    Gauge

    groundcover_kube_pod_init_container_status_waiting_reason

    Describes the reason the init container is currently in waiting state

    Number

    Gauge

    groundcover_kube_pod_spec_volumes_persistentvolumeclaims_readonly

    Whether the PersistentVolumeClaim is mounted as read-only in the Pod

    Number

    Gauge

    groundcover_kube_pod_status_phase

    The pods current phase

    Number

    Gauge

    groundcover_kube_pod_status_ready

    Describes whether the pod is ready to serve requests

    Number

    Gauge

    groundcover_kube_pod_status_scheduled

    Describes the status of the scheduling process for the pod

    Number

    Gauge

    groundcover_kube_pod_status_unschedulable

    Whether the Pod is unschedulable

    Number

    Gauge

    groundcover_kube_pod_tolerations

    Pod tolerations configuration

    Number

    Gauge

    groundcover_kube_replicaset_spec_replicas

    Desired number of replicas for the ReplicaSet

    Number

    Gauge

    groundcover_kube_replicaset_status_fully_labeled_replicas

    Number of fully labeled replicas for the ReplicaSet

    Number

    Gauge

    groundcover_kube_replicaset_status_observed_generation

    Most recent generation observed for the ReplicaSet

    Number

    Gauge

    groundcover_kube_replicaset_status_ready_replicas

    Number of ready replicas for the ReplicaSet

    Number

    Gauge

    groundcover_kube_replicaset_status_replicas

    Number of replicas for the ReplicaSet

    Number

    Gauge

    groundcover_kube_resourcequota

    Resource quota information (labeled by resource and type: hard/used)

    Number

    Gauge

    groundcover_kube_resourcequota_created

    Creation timestamp of the ResourceQuota as Unix seconds

    Seconds

    Gauge

    groundcover_kube_statefulset_metadata_generation

    Sequence number representing a specific generation of the StatefulSet

    Number

    Gauge

    groundcover_kube_statefulset_replicas

    Desired number of replicas for the StatefulSet

    Number

    Gauge

    groundcover_kube_statefulset_status_current_revision

    Current revision of the StatefulSet

    Number

    Gauge

    groundcover_kube_statefulset_status_observed_generation

    Most recent generation observed for the StatefulSet

    Number

    Gauge

    groundcover_kube_statefulset_status_replicas

    Number of replicas for the StatefulSet

    Number

    Gauge

    groundcover_kube_statefulset_status_replicas_available

    Number of available replicas for the StatefulSet

    Number

    Gauge

    groundcover_kube_statefulset_status_replicas_current

    Number of current replicas for the StatefulSet

    Number

    Gauge

    groundcover_kube_statefulset_status_replicas_ready

    Number of ready replicas for the StatefulSet

    Number

    Gauge

    groundcover_kube_statefulset_status_replicas_updated

    Number of updated replicas for the StatefulSet

    Number

    Gauge

    groundcover_kube_statefulset_status_update_revision

    Update revision of the StatefulSet

    Number

    Gauge

    groundcover_kube_job_duration

    Time elapsed between the start and completion time of the Job, or current time if the Job is still running

    Seconds

    Gauge

    groundcover_kube_pod_uptime

    Time elapsed since the Pod was created

    Seconds

    Gauge

    groundcover_container_cpu_request_usage_percent

    CPU usage rate out of request (usage/request)

    Percentage

    Gauge

    groundcover_container_cpu_throttled_percent

    Percentage of CPU throttling for the container

    Percentage

    Gauge

    groundcover_container_cpu_throttled_periods

    Total number of throttled CPU periods for the container

    Number

    Counter

    groundcover_container_cpu_throttled_rate_millis

    Rate of CPU throttling for the container

    mCPU

    Gauge

    groundcover_container_cpu_throttled_seconds_total

    Total CPU throttling time for K8s container

    Seconds

    Counter

    groundcover_container_cpu_usage_percent

    CPU usage rate (usage/limit)

    Percentage

    Gauge

    groundcover_container_m_cpu_usage_seconds_total

    Total CPU usage time in milli-CPUs for the container

    mCPU

    Counter

    groundcover_container_m_cpu_usage_system_seconds_total

    Total CPU time spent in system mode for the container

    Seconds

    Counter

    groundcover_container_m_cpu_usage_user_seconds_total

    Total CPU time spent in user mode for the container

    Seconds

    Counter

    groundcover_container_cpu_limit_m_cpu

    K8s container CPU limit

    mCPU

    Gauge

    groundcover_container_cpu_request_m_cpu

    K8s container requested CPU allocation

    mCPU

    Gauge

    groundcover_container_cpu_pressure_full_avg10

    Average percentage of time all non-idle tasks were stalled on CPU over 10 seconds

    Percentage

    Gauge

    groundcover_container_cpu_pressure_full_avg300

    Average percentage of time all non-idle tasks were stalled on CPU over 300 seconds

    Percentage

    Gauge

    groundcover_container_cpu_pressure_full_avg60

    Average percentage of time all non-idle tasks were stalled on CPU over 60 seconds

    Percentage

    Gauge

    groundcover_container_cpu_pressure_full_total

    Total time all non-idle tasks were stalled waiting for CPU

    Microseconds

    Counter

    groundcover_container_cpu_pressure_some_avg10

    Average percentage of time at least some tasks were stalled on CPU over 10 seconds

    Percentage

    Gauge

    groundcover_container_cpu_pressure_some_avg300

    Average percentage of time at least some tasks were stalled on CPU over 300 seconds

    Percentage

    Gauge

    groundcover_container_cpu_pressure_some_avg60

    Average percentage of time at least some tasks were stalled on CPU over 60 seconds

    Percentage

    Gauge

    groundcover_container_cpu_pressure_some_total

    Total time at least some tasks were stalled waiting for CPU

    Microseconds

    Counter

    groundcover_container_memory_kernel_usage_bytes

    Kernel memory usage for the container

    Bytes

    Gauge

    groundcover_container_memory_limit_bytes

    K8s container memory limit

    Bytes

    Gauge

    groundcover_container_memory_major_page_faults

    Total number of major page faults for the container

    Number

    Counter

    groundcover_container_memory_oom_events

    Total number of out-of-memory (OOM) events for the container

    Number

    Counter

    groundcover_container_memory_page_faults

    Total number of page faults for the container

    Number

    Counter

    groundcover_container_memory_request_bytes

    K8s container requested memory allocation

    Bytes

    Gauge

    groundcover_container_memory_request_used_percent

    Memory usage rate out of request (usage/request)

    Percentage

    Gauge

    groundcover_container_memory_rss_bytes

    Current memory resident set size (RSS)

    Bytes

    Gauge

    groundcover_container_memory_swap_usage_bytes

    Swap memory usage for the container

    Bytes

    Gauge

    groundcover_container_memory_usage_bytes

    Current memory usage for the container

    Bytes

    Gauge

    groundcover_container_memory_usage_peak_bytes

    Peak memory usage for the container

    Bytes

    Gauge

    groundcover_container_memory_used_percent

    Memory usage rate (usage/limit)

    Percentage

    Gauge

    groundcover_container_memory_pressure_full_avg10

    Average percentage of time all non-idle tasks were stalled on memory over 10 seconds

    Percentage

    Gauge

    groundcover_container_memory_pressure_full_avg300

    Average percentage of time all non-idle tasks were stalled on memory over 300 seconds

    Percentage

    Gauge

    groundcover_container_memory_pressure_full_avg60

    Average percentage of time all non-idle tasks were stalled on memory over 60 seconds

    Percentage

    Gauge

    groundcover_container_memory_pressure_full_total

    Total time all non-idle tasks were stalled waiting for memory

    Microseconds

    Counter

    groundcover_container_memory_pressure_some_avg10

    Average percentage of time at least some tasks were stalled on memory over 10 seconds

    Percentage

    Gauge

    groundcover_container_memory_pressure_some_avg300

    Average percentage of time at least some tasks were stalled on memory over 300 seconds

    Percentage

    Gauge

    groundcover_container_memory_pressure_some_avg60

    Average percentage of time at least some tasks were stalled on memory over 60 seconds

    Percentage

    Gauge

    groundcover_container_memory_pressure_some_total

    Total time at least some tasks were stalled waiting for memory

    Microseconds

    Counter

    groundcover_container_io_write_ops_total

    Total number of write operations by the container

    Number

    Counter

    groundcover_container_disk_delay_seconds

    K8s container disk I/O delay

    Seconds

    Counter

    groundcover_container_io_pressure_full_avg10

    Average percentage of time all non-idle tasks were stalled on I/O over 10 seconds

    Percentage

    Gauge

    groundcover_container_io_pressure_full_avg300

    Average percentage of time all non-idle tasks were stalled on I/O over 300 seconds

    Percentage

    Gauge

    groundcover_container_io_pressure_full_avg60

    Average percentage of time all non-idle tasks were stalled on I/O over 60 seconds

    Percentage

    Gauge

    groundcover_container_io_pressure_full_total

    Total time all non-idle tasks were stalled waiting for I/O

    Microseconds

    Counter

    groundcover_container_io_pressure_some_avg10

    Average percentage of time at least some tasks were stalled on I/O over 10 seconds

    Percentage

    Gauge

    groundcover_container_io_pressure_some_avg300

    Average percentage of time at least some tasks were stalled on I/O over 300 seconds

    Percentage

    Gauge

    groundcover_container_io_pressure_some_avg60

    Average percentage of time at least some tasks were stalled on I/O over 60 seconds

    Percentage

    Gauge

    groundcover_container_io_pressure_some_total

    Total time at least some tasks were stalled waiting for I/O

    Microseconds

    Counter

    groundcover_container_network_tx_bytes_total

    Total bytes transmitted by the container

    Bytes

    Counter

    groundcover_container_network_tx_dropped_total

    Total number of transmitted packets dropped by the container

    Number

    Counter

    groundcover_container_network_tx_errors_total

    Total number of errors encountered while transmitting packets

    Number

    Counter

    groundcover_host_cpu_usage_percent

    Percentage of used cpu in the current host

    Percentage

    Gauge

    groundcover_host_cpu_num_cores

    Number of CPU cores on the host

    Number

    Gauge

    groundcover_host_cpu_user_spent_seconds_total

    Total time spent in user mode

    Seconds

    Counter

    groundcover_host_cpu_user_spent_percent

    Percentage of CPU time spent in user mode

    Percentage

    Gauge

    groundcover_host_cpu_system_spent_seconds_total

    Total time spent in system mode

    Seconds

    Counter

    groundcover_host_cpu_system_spent_percent

    Percentage of CPU time spent in system mode

    Percentage

    Gauge

    groundcover_host_cpu_idle_spent_seconds_total

    Total time spent idle

    Seconds

    Counter

    groundcover_host_cpu_idle_spent_percent

    Percentage of CPU time spent idle

    Percentage

    Gauge

    groundcover_host_cpu_iowait_spent_seconds_total

    Total time spent waiting for I/O to complete

    Seconds

    Counter

    groundcover_host_cpu_iowait_spent_percent

    Percentage of CPU time spent waiting for I/O

    Percentage

    Gauge

    groundcover_host_cpu_nice_spent_seconds_total

    Total time spent on niced processes

    Seconds

    Counter

    groundcover_host_cpu_steal_spent_seconds_total

    Total time spent in involuntary wait (stolen by hypervisor)

    Seconds

    Counter

    groundcover_host_cpu_stolen_spent_percent

    Percentage of CPU time stolen by the hypervisor

    Percentage

    Gauge

    groundcover_host_cpu_irq_spent_seconds_total

    Total time spent handling hardware interrupts

    Seconds

    Counter

    groundcover_host_cpu_softirq_spent_seconds_total

    Total time spent handling software interrupts

    Seconds

    Counter

    groundcover_host_cpu_interrupt_spent_percent

    Percentage of CPU time spent handling interrupts

    Percentage

    Gauge

    groundcover_host_cpu_guest_spent_seconds_total

    Total time spent running guest processes

    Seconds

    Counter

    groundcover_host_cpu_guest_spent_percent

    Percentage of CPU time spent running guest processes

    Percentage

    Gauge

    groundcover_host_cpu_guest_nice_spent_seconds_total

    Total time spent running niced guest processes

    Seconds

    Counter

    groundcover_host_cpu_context_switches_total

    Total number of context switches in the current host

    Number

    Counter

    groundcover_host_cpu_load_avg1

    CPU load average over 1 minute

    Number

    Gauge

    groundcover_host_cpu_load_avg5

    CPU load average over 5 minutes

    Number

    Gauge

    groundcover_host_cpu_load_avg15

    CPU load average over 15 minutes

    Number

    Gauge

    groundcover_host_cpu_load_norm1

    Normalized CPU load over 1 minute

    Number

    Gauge

    groundcover_host_cpu_load_norm5

    Normalized CPU load over 5 minutes

    Number

    Gauge

    groundcover_host_cpu_load_norm15

    Normalized CPU load over 15 minutes

    Number

    Gauge

    groundcover_host_mem_free_bytes

    Free memory in the current host

    Bytes

    Gauge

    groundcover_host_mem_available_bytes

    Available memory in the current host

    Bytes

    Gauge

    groundcover_host_mem_cached_bytes

    Cached memory in the current host

    Bytes

    Gauge

    groundcover_host_mem_buffers_bytes

    Buffer memory in the current host

    Bytes

    Gauge

    groundcover_host_mem_shared_bytes

    Shared memory in the current host

    Bytes

    Gauge

    groundcover_host_mem_slab_bytes

    Slab memory in the current host

    Bytes

    Gauge

    groundcover_host_mem_sreclaimable_bytes

    Reclaimable slab memory in the current host

    Bytes

    Gauge

    groundcover_host_mem_page_tables_bytes

    Page tables memory in the current host

    Bytes

    Gauge

    groundcover_host_mem_commit_limit_bytes

    Memory commit limit in the current host

    Bytes

    Gauge

    groundcover_host_mem_committed_as_bytes

    Committed address space memory in the current host

    Bytes

    Gauge

    groundcover_host_mem_swap_cached_bytes

    Cached swap memory in the current host

    Bytes

    Gauge

    groundcover_host_mem_swap_total_bytes

    Total swap memory in the current host

    Bytes

    Gauge

    groundcover_host_mem_swap_free_bytes

    Free swap memory in the current host

    Bytes

    Gauge

    groundcover_host_mem_swap_used_bytes

    Used swap memory in the current host

    Bytes

    Gauge

    groundcover_host_mem_swap_in_bytes_total

    Swap in bytes in the current host

    Bytes

    Counter

    groundcover_host_mem_swap_out_bytes_total

    Swap out bytes in the current host

    Bytes

    Counter

    groundcover_host_mem_swap_free_percent

    Percentage of free swap memory in the current host

    Percentage

    Gauge

    groundcover_host_mem_usable_percent

    Percentage of usable (available) memory in the current host

    Percentage

    Gauge

    groundcover_host_disk_space_used_percent

    Percentage of used disk space in the current host

    Percentage

    Gauge

    groundcover_host_disk_read_time_ms_total

    Total time spent reading from disk per device in the current host

    Milliseconds

    Counter

    groundcover_host_disk_write_time_ms_total

    Total time spent writing to disk per device in the current host

    Milliseconds

    Counter

    groundcover_host_disk_read_count_total

    Total number of disk reads per device in the current host

    Number

    Counter

    groundcover_host_disk_write_count_total

    Total number of disk writes per device in the current host

    Number

    Counter

    groundcover_host_disk_merged_read_count_total

    Total number of merged disk reads per device in the current host

    Number

    Counter

    groundcover_host_disk_merged_write_count_total

    Total number of merged disk writes per device in the current host

    Number

    Counter

    groundcover_host_io_write_await_ms

    Average time for write requests to be served per device in the current host

    Milliseconds

    Gauge

    groundcover_host_io_await_ms

    Average time for I/O requests to be served per device in the current host

    Milliseconds

    Gauge

    groundcover_host_io_avg_request_size

    Average I/O request size per device in the current host

    Kilobytes

    Gauge

    groundcover_host_io_service_time_ms

    Average service time for I/O requests per device in the current host

    Milliseconds

    Gauge

    groundcover_host_io_avg_queue_size_kb

    Average I/O queue size per device in the current host

    Kilobytes

    Gauge

    groundcover_host_io_utilization_percent

    Percentage of time the device was busy serving I/O requests in the current host

    Percentage

    Gauge

    groundcover_host_io_block_in_total

    Total number of block in the current host

    Number

    Counter

    groundcover_host_io_block_out_total

    Total number of block out in the current host

    Number

    Counter

    groundcover_host_fs_used_percent

    Percentage of used filesystem space in the current host

    Percentage

    Gauge

    groundcover_host_fs_inodes_total

    Total inodes in the filesystem

    Number

    Gauge

    groundcover_host_fs_inodes_used

    Used inodes in the filesystem

    Number

    Gauge

    groundcover_host_fs_inodes_free

    Free inodes in the filesystem

    Number

    Gauge

    groundcover_host_fs_inodes_used_percent

    Percentage of used inodes in the filesystem

    Percentage

    Gauge

    groundcover_host_fs_file_handles_allocated

    Total number of file handles allocated in the current host

    Number

    Gauge

    groundcover_host_fs_file_handles_allocated_unused

    Number of allocated but unused file handles in the current host

    Number

    Gauge

    groundcover_host_fs_file_handles_in_use

    Number of file handles currently in use in the current host

    Number

    Gauge

    groundcover_host_fs_file_handles_max

    Maximum number of file handles available in the current host

    Number

    Gauge

    groundcover_host_fs_file_handles_used_percent

    Percentage of file handles in use in the current host

    Percentage

    Gauge

    groundcover_host_fs_file_handles_used_percent

    Percentage of file handles in use in the current host

    Percentage

    Gauge

    groundcover_host_fs_file_handles_max

    Maximum number of file handles available in the current host

    Number

    Gauge

    groundcover_host_net_transmit_packets_total

    Total packets transmitted on network interface

    Number

    Counter

    groundcover_host_net_receive_dropped_total

    Total number of received packets dropped on network interface

    Number

    Counter

    groundcover_host_net_receive_errors_total

    Total number of receive errors on network interface

    Number

    Counter

    groundcover_host_net_transmit_dropped_total

    Total number of transmitted packets dropped on network interface

    Number

    Counter

    groundcover_host_net_transmit_errors_total

    Total number of transmit errors on network interface

    Number

    Counter

    pod_name

    K8s pod name

    All

    container_name

    K8s container name

    All

    container_image

    K8s container image name

    All

    remote_namespace

    Remote K8s namespace (other side of the communication)

    All

    remote_service_name

    Remote K8s service name (other side of the communication)

    All

    remote_container_name

    Remote K8s container name (other side of the communication)

    All

    type

    The protocol in use (HTTP, gRPC, Kafka, DNS etc.)

    All

    sub_type

    The sub type of the protocol (GET, POST, etc)

    All

    role

    Role in the communication (client or server)

    All

    clustered_resource_name

    The clustered name of the resource, depends on the protocol

    All

    status_code

    "ok", "error" or "unset"

    All

    server

    The server workload/name

    All

    client

    The client workload/name

    All

    server_namesapce

    The server namespace

    All

    client_namespace

    The client namespace

    All

    server_is_external

    Indicate whether the server is external

    All

    client_is_external

    Indicate wheter the client is external

    All

    is_encrypted

    Indicate whether the communication is encrypted

    All

    is_cross_az

    Indicate wether the communication is cross availability zone

    All

    clustered_path

    HTTP / gRPC aggregated resource path (e.g. /metrics/*)

    http, grpc

    method

    HTTP / gRPC method (e.g GET)

    http, grpc

    response_status_code

    Return status code of a HTTP / gPRC request (e.g. 200 in HTTP)

    http, grpc

    dialect

    SQL dialect (MySQL or PostgreSQL)

    mysql, postgresql

    response_status

    Return status code of a SQL query (e.g 42P01 for undefined table)

    mysql, postgresql

    client_type

    Kafka client type (Fetcher / Producer)

    kafka

    topic

    Kafka topic name

    kafka

    partition

    Kafka partition identifier

    kafka

    error_code

    Kafka return status code

    kafka

    query_type

    type of DNS query (e.g. AAAA)

    dns

    response_return_code

    Return status code of a DNS resolution request (e.g. Name Error)

    dns

    exit_code

    K8s container termination exit code

    container_state, container_crash

    state

    K8s container current state (Running, Waiting or Terminated)

    container_state

    state_reason

    K8s container state transition reason (e.g CrashLoopBackOff or OOMKilled)

    container_state

    crash_reason

    K8s container crash reason (e.g Error, OOMKilled)

    container_crash

    pvc_name

    K8s PVC name

    storage

    parent_entity_id
    perspective_entity_id
    perspective_entity_is_external
    perspective_entity_issue_id
    perspective_entity_name
    perspective_entity_namespace
    perspective_entity_resource_id

    groundcover_resource_success_counter

    total amount of resource requests with OK status codes

    Number

    Counter

    groundcover_resource_latency_seconds

    resource latency

    Seconds

    Summary

    groundcover_workload_success_counter

    total amount of requests handled by the workload with OK status codes

    Number

    Counter

    groundcover_workload_latency_seconds

    resource latency across all of the workload APIs

    Seconds

    Summary

    groundcover_node_allocatable_cpum_cpu

    Allocatable CPU in the current node

    mCPU

    Gauge

    groundcover_node_allocatable_mem_bytes

    Allocatable memory in the current node

    Bytes

    Gauge

    groundcover_node_mem_used_percent

    Percentage of used memory in the current node

    Percentage

    groundcover_pvc_usage_bytes

    Persistent Volume Claim (PVC) usage

    Bytes

    Gauge

    groundcover_pvc_capacity_bytes

    Persistent Volume Claim (PVC) capacity

    Bytes

    Gauge

    groundcover_pvc_available_bytes

    Available Persistent Volume Claim (PVC) space

    Bytes

    groundcover_network_rx_bytes_total

    Total bytes received by the workload

    Bytes

    Counter

    groundcover_network_tx_bytes_total

    Total bytes sent by the workload

    Bytes

    Counter

    groundcover_network_connections_opened_total

    Total connections opened by the workload

    Number

    groundcover_kube_cronjob_status_active

    Number of active CronJob executions

    Number

    Gauge

    groundcover_kube_daemonset_status_current_number_scheduled

    Number of Pods currently scheduled by the DaemonSet

    Number

    Gauge

    groundcover_kube_daemonset_status_desired_number_scheduled

    Desired number of Pods scheduled by the DaemonSet

    Number

    groundcover_container_cpu_usage_rate_millis

    CPU usage rate

    mCPU

    Gauge

    groundcover_container_cpu_cfs_periods_total

    Total number of elapsed CPU CFS scheduler enforcement periods for the container

    Number

    Counter

    groundcover_container_cpu_delay_seconds

    K8s container CPU delay

    Seconds

    groundcover_container_memory_working_set_bytes

    Current memory working set

    Bytes

    Gauge

    groundcover_container_mem_working_set_bytes

    Working set memory usage for the container

    Bytes

    Gauge

    groundcover_container_memory_cache_usage_bytes

    Memory cache usage for the container

    Bytes

    groundcover_container_io_read_bytes_total

    Total bytes read by the container

    Bytes

    Counter

    groundcover_container_io_read_ops_total

    Total number of read operations by the container

    Number

    Counter

    groundcover_container_io_write_bytes_total

    Total bytes written by the container

    Bytes

    groundcover_container_network_rx_bytes_total

    Total bytes received by the container

    Bytes

    Counter

    groundcover_container_network_rx_dropped_total

    Total number of received packets dropped by the container

    Number

    Counter

    groundcover_container_network_rx_errors_total

    Total number of errors encountered while receiving packets

    Number

    groundcover_container_uptime_seconds

    Uptime of the container

    Seconds

    Gauge

    groundcover_container_crash_count

    Total count of container crashes

    Number

    Counter

    groundcover_host_uptime_seconds

    Uptime of the current host

    Seconds

    Gauge

    groundcover_host_cpu_capacity_m_cpu

    CPU capacity in the current host

    mCPU

    Gauge

    groundcover_host_cpu_usage_m_cpu

    Cpu usage in the current host

    mCPU

    groundcover_host_mem_capacity_bytes

    Memory capacity in the current host

    Bytes

    Gauge

    groundcover_host_mem_used_bytes

    Memory used in the current host

    Bytes

    Gauge

    groundcover_host_mem_used_percent

    Percentage of used memory in the current host

    Percentage

    groundcover_host_disk_space_used_bytes

    Used disk space in the current host

    Bytes

    Gauge

    groundcover_host_disk_space_free_bytes

    Free disk space in the current host

    Bytes

    Gauge

    groundcover_host_disk_space_total_bytes

    Total disk space in the current host

    Bytes

    groundcover_host_io_read_kb_per_sec

    Disk read throughput per device in the current host

    Kilobytes per second

    Gauge

    groundcover_host_io_write_kb_per_sec

    Disk write throughput per device in the current host

    Kilobytes per second

    Gauge

    groundcover_host_io_read_await_ms

    Average time for read requests to be served per device in the current host

    Milliseconds

    groundcover_host_fs_used_bytes

    Used filesystem space in the current host

    Bytes

    Gauge

    groundcover_host_fs_free_bytes

    Free filesystem space in the current host

    Bytes

    Gauge

    groundcover_host_fs_total_bytes

    Total filesystem space in the current host

    Bytes

    groundcover_host_fs_file_handles_allocated

    Total number of file handles allocated in the current host

    Number

    Gauge

    groundcover_host_fs_file_handles_allocated_unused

    Number of allocated but unused file handles in the current host

    Number

    Gauge

    groundcover_host_fs_file_handles_in_use

    Number of file handles currently in use in the current host

    Number

    groundcover_host_net_receive_bytes_total

    Total bytes received on network interface

    Bytes

    Counter

    groundcover_host_net_transmit_bytes_total

    Total bytes transmitted on network interface

    Bytes

    Counter

    groundcover_host_net_receive_packets_total

    Total packets received on network interface

    Number

    clusterId

    Name identifier of the K8s cluster

    All

    region

    Cloud provider region name

    All

    namespace

    K8s namespace

    All

    workload_name

    K8s workload (or service) name

    groundcover_resource_total_counter

    total amount of resource requests

    Number

    Counter

    groundcover_resource_error_counter

    total amount of requests with error status codes

    Number

    Counter

    groundcover_resource_issue_counter

    total amount of requests which were flagged as issues

    Number

    groundcover_workload_total_counter

    total amount of requests handled by the workload

    Number

    Counter

    groundcover_workload_error_counter

    total amount of requests handled by the workload with error status codes

    Number

    Counter

    groundcover_workload_issue_counter

    total amount of requests handled by the workload which were flagged as issues

    Number

    groundcover_workload_client_offset

    client last message offset (for producer the last offset produced, for consumer the last requested offset), aggregated by workload

    Gauge

    groundcover_workload_calc_lagged_messages

    current lag in messages, aggregated by workload

    Number

    Gauge

    groundcover_workload_calc_lag_seconds

    current lag in time, aggregated by workload

    Seconds

    Let us know over Slack!arrow-up-right
    Gauge
    Gauge
    Counter
    Gauge
    Counter
    Gauge
    Counter
    Counter
    Gauge
    Gauge
    Gauge
    Gauge
    Gauge
    Gauge
    Counter

    All

    Counter
    Counter
    Gauge