1 of 100

docs

Welcome

Introduction

groundcover is a full stack, cloud-native observability platform, developed to break all industry paradigms - from making instrumentation a thing of the past, to decoupling cost from data volumes.

The groundcover platform consolidates all your traces, metrics, logs, and Kubernetes events into a single pane of glass, allowing you to identify issues faster than ever before and conduct granular investigations for quick remediation and long-term prevention.

Our pricing is not impacted by the volume of data generated by the environments you monitor, so you can dare to start monitoring environments that had been blind spots until now - such as your Dev and Staging clusters. This, in turn, provides you visibility into all your environments, making it much more likely to identify issues in the early stages of development, rather than in your live product.

groundcover introduces game-changing concepts to observability:

eBPF sensor

eBPF (extended Berkeley Packet Filter) is a groundbreaking technology that has significantly impacted the Linux kernel, offering a new way to safely and efficiently extend its capabilities.

By powering our sensor with eBPF, groundcover unlocks unprecedented granularity on your cloud environment, while also practically eliminating the need for human involvement in the installation and deployment process. Our unique sensor collects data directly from the Linux kernel with near-zero impact on CPU and memory.

Advantages of our eBPF sensor:

Zero instrumentation: groundcover's eBPF sensor gathers granular observability data without the need for integrating an SDK or changing your applications' code in any way. This enables all your logs, metrics, traces, and other observability data to flow automatically into the platform. In minutes, you gain full visibility into application and infrastructure health, performance, resource usage, and more.
Minimal resources footprint: groundcover’s sensor in installed on a dedicated node in each monitored cluster, operating separately from the applications it is monitoring. Without interference with the application's primary functions, the groundcover platform operates with near-zero impact on your resources, maintaining the applications' performance and avoiding unexpected overhead on the infrastructure.
A new level of insight granularity: With direct access to the Linux kernel, our eBPF sensor enables the collection of data straight from the source. This guarantees that the data is clean, unaltered, and precise. It also offers access to unique insight on your application and infrastructure, such as the ability to view the full traces of payloads, or analyzing network performance over time.

Bring-your-own-cloud (BYOC) architecture

The one-of-a-kind architecture on which groundcover was built eliminates all requirements to stream your logs, metrics, traces, and other monitoring data outside of your environment and into a third-party's cloud. By leveraging integrations with best-of-breed technologies, including ClickHouse and Victoria Metrics, all your observability is stored data locally, with the ability of being fully managed by groundcover.

Advantages of our BYOC architecture:

By separating the data plane from the control plane, you get the advantages of a SaaS solution, without its security and privacy challenges.
With multiple deployment models available, you also get to choose the level of security and privacy your organization needs, up to the highest standards (FedRamp-level).
Automated deployment, maintenance & resource optimization with our inCloud Managed deployment option.

This concept is unique to groundcover, and takes a while to grasp. Read about our BYOC architecture more in detail in this dedicated section.

Learn about groundcover inCloud Managed (currently available only on a paid plan), which enables you to deploy groundcover's control plane inside your own environment and delegate the entire setup and management of the groundcover platform.

Disruptive pricing model

Enabled by our unique BYOC architecture, groundcover's vision is to revolutionize the industry by offering a pricing model that is unheard of anywhere else. Our fully transparent pricing model is based only on the number of nodes being monitored, and the costs of hosting the groundcover backend in your environment. Volume of logs, metrics, traces, and all other observability data don’t affect your cost. This results in savings of 60-90% compared to SaaS platforms.

In addition, all our subscription tiers never limit your access to features and capabilities.

Advantages of our nodes-based pricing model:

Cost is predictable and transparent, becoming an enabler of growth and expansion.
The ability to deploy groundcover in data-intensive environments enables the monitoring of Dev and Staging clusters, which promotes early identification of issues.
No cardinality or retention limits

Read our latest customer stories to learn how organization of varying sizes all reduce their observability costs dramatically by migrating to groundcover:

Stream processing

groundcover applies a stream processing approach to collect and control the continuous flow of data to gain immediate insights, detect anomalies, and respond to changing conditions. Unlike batch processing, where data is collected over a period and then analyzed, stream processing analyzes the data as it flows through the system.

Our platform uses a distributed stream processing engine that enables it to ingest huge amounts of data (such as logs, traces and Kubernetes events) in real time. It also processes all that data and instantly generates complex insights (such as metrics and context) based on it.

As a result, the volume of raw data stored dramatically decreases which, in turn, further reduces the overall cost of observability.

Capabilities

Log Management

Designed for high scalability and rapid query performance, enabling quick and efficient log analysis from all your environments. Each log is enriched with actionable context and correlated with relevant metrics and traces, providing a comprehensive view for fast troubleshooting.

Learn more →

Infrastructure Monitoring

The groundcover platform provides cloud-native infrastructure monitoring, enabling automatic collection and real-time monitoring of infrastructure health and efficiency.

Learn more →

Application Performance Monitoring (APM)

Gain end-to-end observability into your applications performance, identify and resolve issues instantly, all with zero code changes.

Learn more →

Real User Monitoring (RUM)

Real User Monitoring (RUM) extends groundcover’s observability platform to the client side, providing visibility into actual user interactions and front-end performance. It tracks key aspects of your web application as experienced by real users, then correlates them with backend metrics, logs, and traces for a full-stack view of your system.

Learn more →

FAQ

How much does groundcover cost?

groundcover's unique pricing model is the first to decouple data volumes from cost of owning and operating the solution. For example, subscribing to our Enterprise plan costs $30 per node/host per month.

Overall, the cost of owning and operating groundcover is based on two factors:

The number of nodes (hosts) you are running in the environment you are monitoring
The costs of hosting groundcover's backend in your environment

Check out our TCO calculator to simulate your total cost of ownership for groundcover.

Can I use groundcover across multiple clusters?

Definitely. As you deploy groundcover each cluster is automatically assigned the unique name it holds inside your cloud environment. You can browse and select all your clusters at one place with our UI experience.

What K8s flavors are supported?

groundcover has been tested and validated on the most common K8s distributions. See full list in the Requirements section.

What protocols are supported?

groundcover supports the most common protocols in most K8s production environments out-of-the-box. See full list here.

What types of data does groundcover collect?

groundcover's kernel-level eBPF sensor automatically collects your logs, application metrics (such as latency, throughput, error rate and much more), infrastructure metrics (such as deployment updates, container crashes etc.), traces, and Kubernetes events. You can control which data is left out of the automatic collection using data obfuscation.

Where is my data being stored?

groundcover stores all the data it collects inside your environment, using the state-of-the-art storage services of ClickHouse and Victoria Metrics, with the option to offload data to object storage such as S3 for long-term retention. See our Architecture section for more details.

Is my data secure?

groundcover stores the data it collects in-cluster, inside your environment without ever leaving the cluster to be stored anywhere else.

Our SaaS UI experience stores only information related to the account, user access and general K8s metadata used for governance (like the number of nodes per cluster, the name given to the cluster etc.).

All the information served to the UI experience is encrypted all the way to the in-cluster data sources. groundcover has no access to your collected data, which is accessible only to an authenticated user from your organization. groundcover does collect telemetry information (opt-out is of course possible) which includes metrics about the performance of the deployment (e.g. resource consumption metrics) and logs reported from the groundcover components running in the cluster.

All telemetry information is anonymized, and contains no data related to your environment.

Regardless, groundcover is SOC2 and ISO 27001 compliant and follows best practices.

How can I invite my team to my workspace?

If you used your business email to create your groundcover account, you can invite your team to your workspace by clicking on the purple "Invite" button on the upper menu. This will open a pop-up where you can enter the emails of the people you want to invite. You also have an option to copy and share your private link.

Note: The Admin of the account (i.e. the person that created it) can also invite users outside of your email domain. Non-admin users can only invite users that share the same email domain. If you used a private email, you can only share the link to your workspace by clicking the "Share" button on the top bar.

Read more about invites in our installation guide.

Is groundcover open source?

groundcover's CLI tool is currently Open Source along side more projects like Murre and Caretta. We're working on releasing more parts of our solution to Open Source very soon. Stay tuned in our GitHub page!

What operating system (OS) do I need to use groundcover?

groundcover’s sensor uses eBPF, which means it can only deployed on a Kubernetes cluster that is running on a Linux system.

Installing using the CLI command is currently only supported on Linux and Mac.

You can install using the Helm command from any operating system.

Once installed, accessing the groundcover platform is possible from any web browser, on any operating system.

Capabilities

Log Management

Stream, store, and query your logs at any scale, for a fixed cost.

Overview

Our Log Management solution is built for high scale and fast query performance so you can analyze logs quickly and effectively from all your cloud environments.

Gain context - Each log data is enriched with actionable context and correlated with relevant metrics and traces in one single view so you can find what you’re looking for and troubleshoot, faster.

Centralize to maximize - The groundcover platform can act as a limitless, centralized log management hub. Your subscription costs are completely unaffected by the amount of logs you choose to store or query. It's entirely up to you to decide.

Collection

Seamless log collection

groundcover ensures a seamless log collection experience with our proprietary eBPF sensor, which automatically collects and aggregates all logs in all formats - including JSON, plain text, NGINX logs, and more. All this without any configuration needed.

This sensor is deployed as a DaemonSet, running a single pod on each node within your Kubernetes cluster. This configuration enables the groundcover platform to automatically collect logs from all of your pods, across all namespaces in your cluster. This means that once you've installed groundcover, no further action is needed on your part for log collection. The logs collected by each sensor instance are then channeled to the OTel Collector.

OTel Collector: A vendor-agnostic way to receive, process and export telemetry data.

Acting as the central processing hub, the OTel Collector is a vendor-agnostic tool that receives logs from various sensor pods. It processes, enriches, and forwards the data into groundcover's ClickHouse database, where all log data from your cluster is securely stored.

Logs Attributes

Logs Attributes enable advanced filtering capabilities and is currently supported for the formats:

JSON
Common Log Format (CLF) - like those from NGNIX and Kong
logfmt

groundcover automatically detects the format of these logs, extracting key:value pairs from the original log records as Attributes.

Each attribute can be added to your filters and search queries.

Example: filtering a log in a supported format with a field of a request path "/status" will look as follows: @request.path:"/status". Syntax can be found here.

Configuration

groundcover offers the flexibility to craft tailored collection filtering rules, you can choose to set up filters and collect only the logs that are essential for your analysis, avoiding unnecessary data noise. For guidance on configuring your filters, explore our Customize Logs Collection section.

You also have the option to define the retention period for your logs in the ClickHouse database. By default, logs are retained for 3 days. To adjust this period to your preferences, visit our Customize Retention section for instructions.

Log Explorer

Once logs are collected and ingested, they are available within the groundcover platform in the Log Explorer, which is designed for quick searches and seamless exploration of your logs data. Using the Log Explorer you can troubleshoot and explore your logs with advanced search capabilities and filters, all within a clear and fast interface.

Search and filter

The Log Explorer integrates dynamic filters and a versatile search functionality that enables you to quickly and easily identify the right data. You can filter out logs by selecting one or multiple criteria, including log-level, workload, namespace and more, and can limit your search to a specific time range.

Learn more about how to use our search syntaxes

Log Pipelines

groundcover natively supports setting up log pipelines using Vector transforms. This allow for full flexibility in the processing and manipulation of logs being collected - parsing additional patterns by regex, renaming attributes, and many more.

Learn more about how to configure log pipelines

Infrastructure Monitoring

Get complete visibility into your cloud infrastructure performance at any scale, easily access all your metrics in one place and optimize infrastructure efficiency.

Overview

The groundcover platform offers infrastructure monitoring capabilities that were built for cloud-native environments. It enables you to track the health and efficiency of your infrastructure instantly, with an effortless deployment process.

Troubleshoot efficiently - acting as a centralized hub for all your infrastructure, application and customer metrics allows you to query, correlate and troubleshoot your cloud environments using real time data and insight on your entire stack.

Collection

groundcover's proprietary eBPF sensor leverages all its innovative powers to collect comprehensive data across your cloud environments without the burden of performance overhead. This data is sourced from various Kubernetes components, including kube-system workloads, cluster information via the Kubernetes API, and the applications' interactions with the Kubernetes infrastructure. This level of detailed collection at the kernel level enables the ability to provide actionable insights into the health of your Kubernetes clusters, which are indispensable for troubleshooting existing issues and taking proactive steps to future-proof your cloud environments.

Configuration

Enrichment

Infrastructure Metrics

Monitoring a cluster involves tracking resources that are critical to the performance and stability of the entire system. Monitoring these essential metrics is crucial for maintaining a healthy Kubernetes cluster:

CPU consumption: It's essential to track the CPU resources being utilized against the total capacity to prevent workloads from failing due to insufficient CPU availability.
Memory utilization: Keeping an eye on the remaining memory resources ensures that your cluster doesn't encounter disruptions due to memory shortages.
Disk space allocation: For Kubernetes clusters running stateful applications or requiring persistent storage for data, such as etcd databases, tracking the available disk space is crucial to avert potential storage deficiencies.
Network usage: Visualize traffic rates and connections being established and closed on a service-to-service level of granularity, and easily pinpoint cross availability zone communication to investigate misconfigurations and surging costs.

Container CPU and Memory

Available Labels

type

clusterId region namespace node_name workload_name

pod_name container_name container_image

Available Metrics

Node CPU, Memory and Disk

Available Labels

type clusterId region node_name

Available Metrics

PVC Usage

Available Labels

type clusterId region name namespace

Available Metrics

Network Usage

Available Labels

clusterId workload_name namespace container_name remote_service_name remote_namespace remote_is_external availability_zone region remote_availability_zone remote_region is_cross_az protocol role server_port encryption transport_protocol is_loopback

Notes:

is_loopback and remote_is_external are special labels that indicate the remote service is either the same service as the recording side (loopback) or resides in an external network, e.g managed service outside of the cluster (external).
- In both cases the remote_service_name and the remote_namespace labels will be empty
is_cross_az means the traffic was sent and/or received between two different availability zones. This is a helpful flag to quickly identify this special kind of communication.
- The actual zones are detailed in the availability_zone and remote_availability_zone labels

Available Metrics

Application Performance Monitoring (APM)

Gain end-to-end observability into your applications performance, identify and resolve issues instantly - all with zero code changes.

Overview

The groundcover platform collects data all across your stack using the power of eBPF instrumentation. Our proprietary eBPF sensor is installed in seconds and provides 100% coverage into application metrics and traces with zero code changes or configurations.

Resolve faster - By seamlessly correlating traces with application metrics, logs, and infrastructure events, groundcover’s APM enables you to detect and resolve root issues faster.

Improve user experience - Optimize your application performance and resource utilization faster than ever before, avoid downtimes and make poor end-user experience a thing of the past.

Collection

Our revolutionary eBPF sensor, Flora, is deployed as a DaemonSet in your Kubernetes cluster. This approach allows us to inspect every packet that each service is sending or receiving, achieving 100% coverage. No sampling rates or relying on statistical luck - all requests and responses are observed.

This approach would not be feasible without a resource-efficient eBPF-powered sensor. eBPF not only extends the ability to pinpoint issues - it does so with much less overhead than any other method. eBPF can be used to analyze traffic originating from every programming language and SDK - even for encrypted connections!

Click here for a full list of supported technologies

Reconstruction

After being collected by our eBPF code, the traffic is then classified according to its protocol - which is identified directly from the underlying traffic, or the library from which it originated. Connections are reconstructed, and we can generate transactions - HTTP requests and responses, SQL queries and responses etc.

Enrichment

In order to give as much context as possible each transaction is enriched with as much metadata as possible. Some examples might include the pods that took part in this transaction (both client and server), the nodes on which these pods are scheduled, and the state of container at the time of the request.

It is important to emphasize the impressive granularity level with which this process takes place - every single transaction observed is fully enriched. This allows us to perform more advanced aggregations.

Aggregation

After being enriched by as much context as possible, the transactions as grouped together into meaningful aggregations. These could be defined by the workloads involved, the protocols detected and the resources that were accessed in the operations. These aggregations will mostly come into play when displaying golden signals.

Exporting

After collecting the data, contextualizing it and putting it together in meaningful aggregations - we can now create metrics and traces to provide meaningful insights into the services' behaviors.

Metrics

Learn how groundcover's application metrics work:

Traces

Learn how groundcover's application traces work:

Traces

Our traces philosophy

Traces are a powerful observability pillar, providing granular insights into microservice interactions. Traditionally, they were hard to implement, requiring coordination of multiple teams and constant code changes, making this critical aspect very challenging to maintain.

groundcover's eBPF sensor disrupts the famous tradeoff, empowering developers to gain full visibility into their applications, effortlessly and without any code changes.

The platform supports two kinds of traces:

eBPF traces

These traces are automatically generated for every supported service in your stack. They are available out-of-the-box and within seconds of installation. These traces always include critical information such as:

All services that took part in the interaction (both client and server)
Accessed resource
Full payloads, including:
- All headers
- All query parameters
- All bodies - for both the request and response

3rd-party traces

These can be ingested into the platform, allowing to leverage already existing instrumentation to create a single pane of glass for all of your traces.

Traces are stored in groundcover's ClickHouse deployment, ensuring top notch performance on every scale.

For more details about ingesting 3rd party traces, see the integrations page.

Sampling

groundcover further disrupts the customary traces experience by reinventing the concept of sampling. This innovation differs between the different types of traces:

eBPF traces

These are generated by using 100% of the data, always processing every request being made, on every scale. However, the groundcover platform utilizes smart sampling to only store a fraction of the traces, while still generating an accurate picture. In general, sampling is performed according to these rules:

Requests with unusually high or low latencies, measured per resource
Requests which returned an error response (e.g 500 status code for HTTP)
"Normal" requests which form the baseline for each resource

Lastly, stream processing is utilized to make the sampling decisions on the node itself, without having to send or save any redundant traces.

Certain aspects of our sampling algorithm are configurable - read more here.

3rd-party traces

These traces are always ingested fully - meaning, no sampling is applied to traces which have been generated elsewhere and ingested by the platform.

When integrating 3rd-party traces, it is often wise to configure some sampling mechanism according to the specific use case.

Additional Context

Each trace is enriched with additional information to give as much context as possible for the service which generated the trace. This includes:

Container information - image, environment variables, pod name
Logs generated by the service around the time of the trace
Golden Signals of the resource around the time of the trace
Kubernetes events relevant to the service
CPU and Memory utilization of the service and the node it is scheduled on

Distributed Tracing

One of the advantages of ingesting 3rd-party traces is the ability to leverage their distributed tracing feature. groundcover natively displays the full trace for ingested traces in the Traces page.

Trace Attributes

Trace Attributes enable advanced filtering and search capabilities. groundcover support attributes across all trace types. This encompasses a diverse range of protocols such as HTTP, MongoDB, PostgreSQL, and others, as well as varied sources including eBPF or manual instrumentations (for example - OpenTelemetry).

groundcover enriches your original traces and generates meaningful metadata as key-value pairs. This metadata includes critical information, such as protocol type, http.path, db.statement, and similar attributes, aligning with OTel conventions. Furthermore, groundcover seamlessly incorporates this metadata from spans received through supported manual instrumentations. For an in-depth understanding of attributes in OTel, please refer to OTel Attributes Documentation (external link to OpenTelemtry website).

Each attribute can be effortlessly integrated into your filters and search queries. You can add them directly from the trace side-panel with a simple click or input them manually into the search bar.

Example: If you want to filter all HTTP traces that contain the path "/products". The query would be formatted as: @http.path:"/products". For a comprehensive guide on the query syntax, see Syntax table below.

Trace Tags

Trace Tags enable advanced filtering and search capabilities. groundcover support tags across all trace types. This encompasses a diverse range of protocols such as HTTP, MongoDB, PostgreSQL, and others, as well as varied sources including eBPF or manual instrumentations (for example - OpenTelemetry).

Tags are powerful metadata components, structured as key-value pairs. They offer insightful information about the resource generating the span, like: container.image.name ,host.name and more.

Tags include metadata enriched by the our sensor and additional metadata if provided by manual instrumentations (such as OpenTelemetry traces) . Utilizing these Tags enhances understanding and context of your traces, allowing for more comprehensive analysis and easier filtering by the relevant information.

Each tag can be effortlessly integrated into your filters and search queries. You can add them directly from the trace side-panel with a simple click or input them manually into the search bar.

Example: If you want to filter all traces from mysql containers - The query would be formatted as: container.image.name:mysql. For a comprehensive guide on the query syntax, see Syntax table below.

Search and filter

The Trace Explorer integrates dynamic filters and a versatile search functionality, to enhance your trace data analysis. You can filter out traces using specific criteria, including trace-status, workload, namespace and more, as well as limit your search to a specific time range.

Learn more about how to use our search syntaxes

Traces Pipelines

groundcover natively supports setting up log pipelines using Vector transforms. This allow for full flexibility in the processing and manipulation of traces being collected - parsing additional patterns by regex, renaming attributes, and many more.

Learn more about how to configure traces pipelines

Controlling retention

groundcover allows full control over the retention of your traces. Read here to learn more.

Custom Configuration

Tracing can be customized in several ways:

Configuring which protocols should be traced
Configuring obfuscation for sensitive payload data
Configuring the sampling mechanism

Supported Technologies

Supported protocols

Supported encryption libraries and runtimes

groundcover seamlessly supports APM for encrypted communication - as long as it's listed below.

Encryption is unsupported for binaries which have been compiled without debug symbols ("stripped"). Known cases:

Crossplane

Real User Monitoring (RUM)

Monitor front-end applications and connect it to your backend — all inside your cloud.

Capture real end-user experiences directly from their browsers and unify these insights with your backend observability data.

Overview

Understand user experience - capture every interaction, page load, and performance metric from the end-user perspective to pinpoint front-end issues in real time.

Resolve issues faster - seamlessly tie front-end events to backend traces and logs in one platform, enabling end-to-end troubleshooting of user journeys.

Privacy first - groundcover’s Bring Your Own Cloud (BYOC) model ensures all RUM data stays in your own cloud environment. Sensitive user data never leaves your infrastructure, ensuring privacy and compliance without sacrificing insight.

Collection

groundcover RUM collects a wide range of data from users’ browsers through a lightweight JavaScript SDK. Once integrated into your web application, the SDK automatically gathers and sends the following telemetry from each user session to the groundcover platform:

Network requests: Every HTTP request initiated by the browser (such as API calls) is captured as a trace. Each client-side request can be linked with its corresponding server-side trace, giving you a complete picture of the request from the user’s click to the backend response.
Front-end logs: Client-side log messages (e.g., console.log outputs, warnings, and errors) are collected and forwarded to groundcover’s log management. This ensures that browser logs are stored alongside your application’s server logs for unified analysis.
Exceptions: Uncaught JavaScript exceptions and errors are automatically captured with full stack traces and contextual data (browser type, URL, etc.). These front-end errors become part groundcover monitors, letting you quickly identify and debug issues in the user’s environment.
Performance metrics (Core Web Vitals): Key performance indicators like page load time and Core Web Vitals like Largest Contentful Paint, Interaction to Next Paint, and Cumulative Layout Shift are measured for each page view. groundcover RUM records these metrics to help you track real-world performance and detect slowdowns affecting users.
User interactions: RUM tracks user interactions such as clicks, keydown, and navigation events. By recording which elements users interact with and when, groundcover helps you reconstruct user flows and understand the sequence of actions leading up to any issue or performance problem.
Custom events: You can instrument your application to send custom events via the RUM SDK. This allows you to capture domain-specific actions or business events (for example, a checkout completion or a specific UI gesture) with associated metadata, providing deeper insight into user behavior beyond automatic captures.

All collected data is streamed securely to your groundcover deployment. Because groundcover runs in your environment, RUM data (including potentially sensitive details from user sessions) is stored in the observability backend within your own cloud. From there, it is aggregated and indexed just like other telemetry, ready to be searched and analyzed in the groundcover UI.

Full-Stack Visibility

One of the core advantages of groundcover RUM is its native integration with backend observability data. Every front-end trace, log, or event captured via RUM is contextualized alongside server-side data:

Trace correlation: Client-side traces (from browser network requests) are automatically correlated with server-side traces captured by groundcover’s eBPF-based instrumentation. This means when a user triggers an API call, you can see the complete distributed trace that spans the browser and the backend services, all in one view.
Unified logging: Front-end log entries and error reports are ingested into the same backend as your server-side logs. In the groundcover Log Explorer, you can filter and search across logs from both client and server, using common fields (like timestamp, session ID, or trace ID) to connect events.
End-to-end troubleshooting: With full-stack data in one platform, you can pivot easily between a user’s session replay, the front-end events, and the backend metrics/traces involved. This end-to-end context significantly reduces the time to isolate whether an issue originated in the frontend (browser/UI) or the backend (services/infrastructure), helping teams pinpoint problems faster across the entire stack.

By bridging the gap between the user’s browser and your cloud infrastructure, groundcover’s RUM capability ensures that no part of the user journey is invisible to your monitoring. This holistic view is critical for optimizing user experience and rapidly resolving issues that span multiple layers of your application.

Sessions Explorer

Once RUM data is collected, it becomes available in the groundcover platform via the Sessions Explorer — a dedicated view for inspecting and troubleshooting user sessions. The Sessions Explorer allows you to navigate through user journeys and understand how your users experience your application.

Clicking on any session opens the Session View, where you can inspect a full timeline of the user’s experience. This view shows every key event captured during the session - including clicks, navigations, network requests, logs, custom events, and errors.

Each event is displayed in sequence with full context like timestamps, URLs, and stack traces. The Session View helps you understand exactly what the user did and what the system reported at each step, making it easier to trace issues and user flows.

Getting Started

Requirements

To ensure a seamless experience with groundcover, it's important to confirm that your environment meets the necessary requirements. Please review the detailed requirements for Kubernetes, our eBPF sensor, and the necessary hardware and resources to guarantee optimal performance.

Kubernetes requirements

groundcover supports a wide range of Kubernetes versions and distributions, including popular platforms like EKS, AKS, and GKE.

Kernel requirements for eBPF sensor

Our state-of-the-art eBPF sensor leverages advanced kernel features to deliver comprehensive monitoring with minimal overhead, requiring specific Linux kernel versions, permissions, and CO:RE support.

Hardware and resource requirements

groundcover fully supports both x86 and ARM processors, ensuring compatibility across diverse environments.

ClickHouse resources

groundcover operates ClickHouse to support many of its core features. This requires suitable resources given to the deployment, which needs to be considered when deploying the platform.

Kubernetes requirements

Kubernetes version

groundcover supports any K8s version from v1.21.

Kubernetes distributions

Kubernetes RBAC permissions

For the installation to complete successfully, permissions to deploy the following objects are required:

StatefulSet
Deployment
ConfigMap
Secret
PVC

Outgoing traffic

groundcover's portal pod sends HTTP requests to the cloud platform app.groundcover.com on port 443.

Kernel requirements for eBPF sensor

Intro

groundcover’s eBPF sensor uses state-of-the-art kernel features to provide full coverage at low overhead. In order to do so it requires certain kernel features which are listed below.

Kernel Version

Version v5.3 or higher (anything since 2020).

Linux Distributions

Permissions

CO:RE support

You can check if your kernel has CO:RE support by manually looking for the BTF file:

If the file exists, congratulations! Your kernel supported CO:RE.

What happens if my kernel is not supported?

CPU architectures

The following architectures are fully supported for all groundcover workloads:

Use groundcover

Querying your groundcover data

Customization

Integrations

Create a new Monitor

Learn how to create and configure monitors using the Wizard, Monitor Catalog, or Import options. The following guide will help you set up queries, thresholds, and alert routing for effective monitoring.

Creating new monitors is currently supported using our web interface only.

In the Monitors section (left navigation bar), navigate to the Issues page or the Monitor List page to create a new Monitor. Click on the blue “Create Monitor” button and select one of the following options from the dropdown:

Using the Monitor Wizard

Overview

The Monitor Wizard is a guided, user-friendly approach to creating and configuring monitors tailored to your observability needs. By breaking down the process into simple steps, it ensures consistency and accuracy.

Section 1: Information

Set up the basic information for the monitor.

Monitor Title (Required):

Add a title for the monitor. The title will appear in notifications and in the Monitor List page.

Description (Optional):

Add description for your monitor, The description will appear when viewing monitor details, you can also use this for your alerts.

Section 2: Query

Select the data source, build the query and define thresholds for the monitor.

If you're unfamiliar with query building in groundcover, refer to the Query Builder section for full details on the different components.

Data Source (Required):
- Select the type of data (Metrics, Infra Metrics, Logs, or Traces).
Query Functionality:
- Choose how to process the data (e.g., average, count).
- Add group-by clauses if applicable.
Time Window (Required):
- Specify the period over which data is aggregated.
- Example: “Over the last 5 minutes.”
Threshold Conditions (Required):
- Define when the monitor triggers. You can use:
  - Greater Than - Trigger when the value exceeds X.
  - Lower Than - Trigger when the value falls below X.
  - Within Range - Trigger when the value is between X and Y.
  - Outside Range - Trigger when the value is not between X and Y.
- Example: “Trigger if disk space usage is greater than 10%.”
Visualization Type (Optional):
- Preview data using Stacked Bar or Line Chart for better clarity. This is just for helping visualize while building the monitor.

Section 3: Display

Customize how the Monitor’s Issues will appear. This section also includes a live preview of the way it will appear in the Issues page.

Ensure that the labels you wish to use dynamically (e.g., span_name, workload) are defined in the query configuration step (Section 2: Query)

Issue Header (required):

Define a name for issues that this Monitor will raise. It's useful to use labels that can include information from the query.

For example, adding {{ alert.labels.statusCode }} to the header will inject the status code to the name of the issue - this becomes especially useful when one Monitor raises multiple issues and you want to quickly understand their content without having to open each one.

Severity (required):

Use severity to categorize alerts by importance.

Select a severity level (S1-S4).

Resource Labels (optional):

The labels here should give you team the context of what is the subject of the issue.

Examples:

span_name for an API based monitor
pod_name for a Pod Crash monitor

Context Labels (optional):

The labels here should give you team the context of where the issue happened.

Examples:
- cluster
- namespace

Section 4: Metadata Labels

Organize and categorize monitors, you can use these to route issues using advanced workflows.

Labels (optional):
- Add key-value pairs for metadata.

Section 5: Evaluation Settings

Define how often the monitor evaluates its conditions.

Evaluation Interval (Required):

Specify how often the monitor evalutes the query

Example: “Evaluate every 1 minute.”

Pending Period (Required):

This ensures that transient conditions do not trigger alerts, reducing false positives. For example, setting this to 10 minutes ensures the condition must persist for at least 10 minutes before firing.

Example: “Wait for 10 minute before alerting.t

Section 6: Routing

Set up how issues from this monitor will be routed.

Select Workflow (Optional):

Route alerts to existing workflows only, this means that other workflows will not process them. Use this to send alerts for a critical application to Slack or PagerDuty.

No Routing (Optional):

This means that any workflow (without filters), will process the issue.

Quick tips to create effective Monitors

Use the Monitor Catalog as much as you can

Whenever possible, use our carefully crafted monitors from the Monitor Catalog. This will save you time, ensure the Monitors are built effectively, and help you align your alerting strategy with best practices. If you can't find one that perfectly matches your needs, use them as your starting point and edit their properties to customize them to your needs.

Give the Monitor a short and clear title

Give the Monitor a clear, short name, that describes its function at a high level.

Examples:

“Workload High API Error Rate”
“Workload Pods High Memory”

The title will appear in the monitors page table and be accessible in workflows and alerts.

Use a Descriptive Issue Header

Choose a clear name for the Issue header, offering a bit more details and going into a more specific description of the monitor name. A Header is a specific property of an issue, so you can add templated dynamic values here. For example, you can use dynamic label values in the header name.

Examples:

“HTTP API Error {{ alert.labels.status_code }}”,
“Workload {{ alert.labels.workload }} Pod Restart”
“{{ alert.labels.customer }} APIs High Latency”.

If you do choose to use templated dynamic values, make sure they exist as monitor query labels.

Use up to 3 Resource Labels

We recommend using up to 3 ResourceHeaderLabels. The labels here should give your team the context of what is the subject of the issue.

Examples:

span_name , pod_name

ResourceHeaderLabels appear as a secondary header in Issues tables across the platform.

Use up to 3 Context Labels

We recommend using up to 3 ContextHeaderLabels. The labels here should give you team the context of where the issue happened.

Examples:

cluster, namespace , workload

ContextHeaderLabels appear on Issues tables across platform, next to your issues.

Using the Import option

This is an advanced feature, please use it with caution.

Here you can add multiple Monitors using an array of Monitors that follows the Monitor YAML structure.

monitors:
- title: HTTP API Errors Monitor
  ...
- title: K8s Node Low Disk Space Monitor
  ...

Click on "Create Monitors" to create them.

Application Metrics

Our metrics philosophy

The groundcover platform generates 100% of its metrics from the actual data. There are no sample rates or complex interpolations to make up for partial coverage. Our measurements represent the real, complete flow of data in your environment.

Stream processing allows us to construct the majority of the metrics on the very node where the raw transactions are recorded. This means the raw data is turned into numbers the moment it becomes possible - removing the need for storing or sending it elsewhere.

Metrics are stored in groundcover's victoria-metrics deployment, ensuring top-notch performance on every scale.

Golden signals

In the world of excessive data, it's important to have a rule of thumb for knowing where to start looking. For application metrics, we rely on our golden signals.

The following metrics are generated for each resource being aggregated:

Requests per second (RPS)
Errors rate
Latencies (p50 and p95)

The golden signals are then displayed in two important ways: Workload and Resource aggregations.

See below for the full list of generated workload and resource golden metrics.

Resource aggregations are highly granularity metrics, providing insights into individual APIs.

Workload aggregations are designed to show an overview of each service, enabling a higher level inspection. These are constructed using all of the resources recorded for each service.

Controlling retention

groundcover allows full control over the retention of your metrics. Learn more here.

List of available metrics

Below you will find the full list of our APM metrics, as well as the labels we export for each. These labels are designed with high granularity in mind for maximal insight depth. All of the metrics listed are available out of the box after installing groundcover, without any further setup.

We fully support the ingestion of custom metrics to further expand the visibility into your environment.

We also allow for building custom dashboards, enabling full freedom in deciding how to display your metrics - building on groundcover's metrics below plus every custom metric ingested.

Our labels

Label name

Description

Relevant types

clusterId

Name identifier of the K8s cluster

region

Cloud provider region name

namespace

K8s namespace

workload_name

K8s workload (or service) name

pod_name

K8s pod name

container_name

K8s container name

container_image

K8s container image name

remote_namespace

Remote K8s namespace (other side of the communication)

remote_service_name

Remote K8s service name (other side of the communication)

remote_container_name

Remote K8s container name (other side of the communication)

type

The protocol in use (HTTP, gRPC, Kafka, DNS etc.)

role

Role in the communication (client or server)

clustered_path

HTTP / gRPC aggregated resource path (e.g. /metrics/*)

http, grpc

method

HTTP / gRPC method (e.g GET)

http, grpc

response_status_code

Return status code of a HTTP / gPRC request (e.g. 200 in HTTP)

http, grpc

dialect

SQL dialect (MySQL or PostgreSQL)

mysql, postgresql

response_status

Return status code of a SQL query (e.g 42P01 for undefined table)

mysql, postgresql

client_type

Kafka client type (Fetcher / Producer)

kafka

topic

Kafka topic name

kafka

partition

Kafka partition identifier

kafka

error_code

Kafka return status code

kafka

query_type

type of DNS query (e.g. AAAA)

dns

response_return_code

Return status code of a DNS resolution request (e.g. Name Error)

dns

method_name, method_class_name

Method code for the operation

amqp

response_method_name, response_method_class_name

Method code for the operation's response

amqp

exit_code

K8s container termination exit code

container_state, container_crash

state

K8s container current state (Running, Waiting or Terminated)

container_state

state_reason

K8s container state transition reason (e.g CrashLoopBackOff or OOMKilled)

container_state

crash_reason

K8s container crash reason (e.g Error, OOMKilled)

container_crash

pvc_name

K8s PVC name

storage

Summary based metrics have an additional quantile label, representing the percentile. Available values: [”0.5”, “0.95”, 0.99”].

groundcover uses a set of internal labels which are not relevant in most use-cases. Find them interesting? Let us know over Slack!

issue_id entity_id resource_id query_id aggregation_id parent_entity_id perspective_entity_id perspective_entity_is_external perspective_entity_issue_id perspective_entity_name perspective_entity_namespace perspective_entity_resource_id

Golden Signals Metrics

In the lists below, we describe error and issue counters. Every issue flagged by groundcover is an error; but not every error is flagged as an issue.

Resource metrics

Name

Description

Type

groundcover_resource_total_counter

total amount of resource requests

Counter

groundcover_resource_error_counter

total amount of requests with error status codes

Counter

groundcover_resource_issue_counter

total amount of requests which were flagged as issues

Counter

groundcover_resource_success_counter

total amount of resource requests with OK status codes

Counter

groundcover_resource_latency_seconds

resource latency [sec]

Summary

Workload metrics

Name

Description

Type

groundcover_workload_total_counter

total amount of requests handled by the workload

Counter

groundcover_workload_error_counter

total amount of requests handled by the workload with error status codes

Counter

groundcover_workload_issue_counter

total amount of requests handled by the workload which were flagged as issues

Counter

groundcover_workload_success_counter

total amount of requests handled by the workload with OK status codes

Counter

groundcover_workload_latency_seconds

resource latency across all of the workload APIs [sec]

Summary

Storage usage metrics

Name

Description

Type

groundcover_pvc_read_bytes_total

total amount of bytes read by the workload from the PVC

Counter

groundcover_pvc_write_bytes_total

total amount of bytes written by the workload to the PVC

Counter

groundcover_pvc_reads_total

total amount of read operations done by the workload from the PVC

Counter

groundcover_pvc_writes_total

total amount of write operations done by the workload to the PVC

Counter

groundcover_pvc_read_latency

latency of read operation by the workload from the PVC, in microseconds

Summary

groundcover_pvc_write_latency

latency of write operation by the workload to the PVC, in microseconds

Summary

Kafka specific metrics

Name

Description

Type

groundcover_client_offset

client last message offset (for producer the last offset produced, for consumer the last requested offset)

Gauge

groundcover_workload_client_offset

client last message offset (for producer the last offset produced, for consumer the last requested offset), aggregated by workload

Gauge

groundcover_calc_lagged_messages

current lag in messages

Gauge

groundcover_workload_calc_lagged_messages

current lag in messages, aggregated by workload

Gauge

groundcover_calc_lag_seconds

current lag in time [sec]

Gauge

groundcover_workload_calc_lag_seconds

current lag in time, aggregated by workload [sec]

Gauge

Explore & Monitors query builder

The Query Builder in the platform's Explore and Monitors sections helps you craft and visualize queries on top of your data - Metrics, Infra Metrics, Logs, and Traces.

Modes Overview

Metrics – Work with all your available metrics. Great for advanced use cases and custom metrics.
Infra Metrics – Use expert-built, predefined queries for common infrastructure scenarios. Ideal if you’re not sure which metric to pick or just want a quick start.
Logs – Query and visualize Logs data.
Traces – Query and visualize Traces, similar to logs.
RUM - Query and visualizer RUM Events.

Metrics/Infra Metrics

When you select the Metrics or Infra Metrics modes, you’ll work with something akin to Prometheus queries - but simplified.

groundcover supports a wide variety of metrics - Application and Infrastructure metrics are automatically generated using our eBPF sensor, and custom metrics can be ingested natively.

Metric Selector:

Search and choose a metric.
View associated labels and metadata (for groundcover’s built-in metrics).
If the chosen built-in metric’s type is known, the Query Builder automatically applies the best-suited function to streamline your workflow.

Infra Metrics Mode:

Select from ready-made queries grouped into categories (e.g., Container CPU, Node Disk).
Perfect if you’re unsure which metric to choose. Just pick a category, and you’re set.

Filters Bar (Metrics/Infra Metrics):

Filter by label key/value pairs.
Use - to exclude values
All filters are ANDed together, but multiple values for the same key form an OR condition.
Type a key and : (e.g cluster:) to list its values.
Use patterns (wildcards, partial matches) to refine results.

Aggregation Function Selector:

sum: Adds up all values.
avg: Calculates the average value.
max: Finds the maximum value.
min: Finds the minimum value.
count: Counts how many data points there are.
no aggregation: Leaves data un-aggregated.

Aggregation Labels Selector:

Select one or more labels to group your results by.

Limit Selector:

Show top or bottom results based on:
- Max: Highest values.
- Min: Lowest values.
- Mean: Highest/lowest average values.
- Median: Highest/lowest median values.
- Last: Highest/lowest most recent values.

Visualization Type:

Time-series: View data over time (time range set by the time-picker).
Table: See instant snapshot data.

Time & Rollup Notes:

The time-picker defines the time range for your query.

Advanced Query

Switching to Advanced Query mode allows you to view and modify the PromQL query generated by the Query Builder. This mode provides full flexibility for advanced users. However, changes made in the editor are not reflected back in the Query Builder. The editor is ideal for making manual adjustments that are beyond the capabilities offered in Query Builder mode.

Selecting or deselecting Clusters and Environments in the Backend Picker won't affect the metrics displayed.

Metrics Formulas

To enhance your data analysis in Metrics mode, groundcover supports the use of Formulas. Formulas allow you to perform arithmetic operations and apply functions to your metrics queries.

Using Formulas:

Assign Query Symbols: each metric query is automatically assigned a letter (A, B, C, etc.).
Construct Formulas: Combine these letters using operators and functions to create expressions.

Supported Operators:

Addition: +
Subtraction: -
Multiplication: *
Division: /
Modulo: %
Exponentiation: **
Parentheses: () for grouping

Example:

To calculate the CPU usage percentage where A is the used CPU metric and B is the total CPU capacity:

(A / B) * 100

Logs/Traces

Filters Bar (Logs/Traces):

Same label-based filtering as Metrics.
Free Text Search (Logs only): Search for any substring.
Exclude terms by prefixing with -
Use * as a wildcard.

Measurement Selection (Logs/Traces):

Count: Count total logs/traces.
Count (unique): Count distinct values of a chosen field.
Avg/Sum/Max/Min: For numeric fields, perform calculations.
Percentiles (P99/P95/P50/P10): Show the value at a specific percentile.

Group By:

Group results by fields (e.g., k8s.namespace, service.name) to break down the data by categories.

Rollup (Logs/Traces):

Choose time buckets (like 1m, 5m) for aggregation. This helps smooth out spikes or show trends over chosen intervals.

Limit:

Limit and sort table data to display only the most relevant rows.

Visualization Type (Logs/Traces):

Time-series
Table
Stat
Top List
Pie

Examples

Here are a few examples to help you understand how to build and visualize queries using the Query Builder:

Query the top 5 workloads with the highest average container CPU usage in the namespaces demo-ng and opentelemetry-demo.

Quickly find the average memory usage of workloads in the namespace demo-ng. Instead of crafting a complex query, we simply selected Container Memory > Usage Amount.

Query the P99 duration of all HTTP traces across the platform. This query is broad, with no filters applied for clusters, namespaces, or specific services.

Narrow it down: Query the P99 duration of HTTP traces, but this time only for outbound traces from a specific workload.

Visualize the distribution of log counts per log level in the cluster demo. This provides a quick snapshot of the log severity levels.

RUM

The RUM Query Builder allows you to query RUM data collected by the groundcover SDK.

It includes the same components as the Logs/Traces builder (Measurement Selection, Group By, Rollup, Limit, Visualization Type), plus an additional selector for Event Type.

The following event types and their properties are available:

Event Type

Property

Description

Type

Page Loads

page_load_time

Page Load Time

Numeric

page_url

Page URL

String

Errors

error_fingerprint

Error Fingerprint

String

error_message

Error Message

String

error_type

Error Type

String

Custom Events

custom_event_name

Event Name

String

Interactions

dom_event_target.text

Target Text

String

dom_event_selector

Target Selector

String

dom_event_target.id

Target ID

String

dom_event_target.className

Target Class Name

String

Performance

performance_metric_name

Metric Name

String

performance_metric_value

Metric Value

Numeric

Navigations

page_url

URL

String

Global properties available for all RUM events:

session_id
service.name
location.path
location.title
user.email
user.organization
user.name
browser.version
browser.name
browser.type
browser.platform

Example

Create a visualization of Errors over time grouped by error_message in the demo service:

Metrics & Labels

Infrastructure Metrics & Labels

Container CPU and Memory

Labels

type

clusterId region namespace node_name workload_name

pod_name container_name container_image

Metrics

Name

Description

Unit

Type

groundcover_container_cpu_usage_rate_millis

CPU usage in mCPU

mCPU

Gauge

groundcover_container_cpu_request_m_cpu

K8s container CPU request

mCPU

Gauge

groundcover_container_cpu_limit_m_cpu

K8s container CPU limit

mCPU

Gauge

groundcover_container_memory_working_set_bytes

current memory working set

Bytes

Gauge

groundcover_container_memory_rss_bytes

current memory RSS

Bytes

Gauge

groundcover_container_memory_request_bytes

K8s container memory request

Bytes

Gauge

groundcover_container_memory_limit_bytes

K8s container memory limit

Bytes

Gauge

groundcover_container_cpu_delay_seconds

K8s container CPU delay

Seconds

Counter

groundcover_container_disk_delay_seconds

K8s container disk delay

Seconds

Counter

groundcover_container_cpu_throttled_seconds_total

K8s container total CPU throttling

Seconds

Counter

Node CPU, Memory and Disk

Labels

type clusterId region node_name

Metrics

Name

Description

Unit

Type

groundcover_node_allocatable_cpum_cpu

amount of allocatable CPU in the current node

mCPU

Gauge

groundcover_node_allocatable_mem_bytes

amount of allocatable memory in the current node

Bytes

Gauge

groundcover_node_mem_used_percent

percent of used memory in current node

0-100

Gauge

groundcover_node_used_disk_space

current used disk space in current node

Bytes

Gauge

groundcover_node_free_disk_space

amount of free disk space in current node

Bytes

Gauge

groundcover_node_total_disk_space

amount of total disk space in current node

Bytes

Gauge

groundcover_node_used_percent_disk_space

percent of used disk space in current node

0-100

Gauge

Storage Usage

Labels

type clusterId region name namespace

Metrics

Name

Description

Unit

Type

groundcover_pvc_usage_bytes

PVC usage

Bytes

Gauge

groundcover_pvc_capacity_bytes

PVC capacity

Bytes

Gauge

groundcover_pvc_available_bytes

PVC available

Bytes

Gauge

groundcover_pvc_usage_percent

percent of used PVC storage

0-100

Gauge

groundcover_pvc_read_bytes_total

total amount of bytes read by the workload from the PVC

Bytes

Counter

groundcover_pvc_write_bytes_total

total amount of bytes written by the workload to the PVC

Bytes

Counter

groundcover_pvc_reads_total

total amount of read operations done by the workload from the PVC

Number

Counter

groundcover_pvc_writes_total

total amount of write operations done by the workload to the PVC

Number

Counter

groundcover_pvc_read_latency

latency of read operation by the workload from the PVC

Seconds

Summary

groundcover_pvc_write_latency

latency of write operation by the workload to the PVC

Seconds

Summary

Network Usage

Labels

Notes:

is_loopback and remote_is_external are special labels that indicate the remote service is either the same service as the recording side (loopback) or resides in an external network, e.g managed service outside of the cluster (external).
- In both cases the remote_service_name and the remote_namespace labels will be empty
is_cross_az means the traffic was sent and/or received between two different availability zones. This is a helpful flag to quickly identify this special kind of communication.
- The actual zones are detailed in the availability_zone and remote_availability_zone labels

Metrics

Name

Description

Unit

Type

groundcover_network_rx_bytes_total

Bytes received by the workload

Bytes

Counter

groundcover_network_tx_bytes_total

Bytes sent by the workload

Bytes

Counter

groundcover_network_connections_opened_total

Connections opened by the workload

Number

Counter

groundcover_network_connections_closed_total

Connections closed by the workload

Number

Counter

groundcover_network_connections_opened_failed_total

Connections attempts failed per workload (including refused connections)

Number

Counter

groundcover_network_connections_refused_failed_total

Connections attempts refused per workload

Number

Counter

Application Metrics & Labels

Label name

Description

Relevant types

clusterId

Name identifier of the K8s cluster

All

region

Cloud provider region name

All

namespace

K8s namespace

All

workload_name

K8s workload (or service) name

All

pod_name

K8s pod name

All

container_name

K8s container name

All

container_image

K8s container image name

All

remote_namespace

Remote K8s namespace (other side of the communication)

All

remote_service_name

Remote K8s service name (other side of the communication)

All

remote_container_name

Remote K8s container name (other side of the communication)

All

type

The protocol in use (HTTP, gRPC, Kafka, DNS etc.)

All

sub_type

The sub type of the protocol (GET, POST, etc)

All

role

Role in the communication (client or server)

All

clustered_resource_name

The clustered name of the resource, depends on the protocol

All

status_code

"ok", "error" or "unset"

All

server

The server workload/name

All

client

The client workload/name

All

server_namesapce

The server namespace

All

client_namespace

The client namespace

All

server_is_external

Indicate whether the server is external

All

client_is_external

Indicate wheter the client is external

All

is_encrypted

Indicate whether the communication is encrypted

All

is_cross_az

Indicate wether the communication is cross availability zone

All

clustered_path

HTTP / gRPC aggregated resource path (e.g. /metrics/*)

http, grpc

method

HTTP / gRPC method (e.g GET)

http, grpc

response_status_code

Return status code of a HTTP / gPRC request (e.g. 200 in HTTP)

http, grpc

dialect

SQL dialect (MySQL or PostgreSQL)

mysql, postgresql

response_status

Return status code of a SQL query (e.g 42P01 for undefined table)

mysql, postgresql

client_type

Kafka client type (Fetcher / Producer)

kafka

topic

Kafka topic name

kafka

partition

Kafka partition identifier

kafka

error_code

Kafka return status code

kafka

query_type

type of DNS query (e.g. AAAA)

dns

response_return_code

Return status code of a DNS resolution request (e.g. Name Error)

dns

exit_code

K8s container termination exit code

container_state, container_crash

state

K8s container current state (Running, Waiting or Terminated)

container_state

state_reason

K8s container state transition reason (e.g CrashLoopBackOff or OOMKilled)

container_state

crash_reason

K8s container crash reason (e.g Error, OOMKilled)

container_crash

pvc_name

K8s PVC name

storage

Summary based metrics have an additional quantile label, representing the percentile. Available values: [”0.5”, “0.95”, 0.99”].

We also use a set of internal labels which are not relevant in most use-cases. Find them interesting? Let us know over Slack!

Golden Signals (Errors & Issues)

In the lists below, we describe error and issue counters. Every issue flagged by the platform is an error; but not every error is flagged as an issue.

Resource metrics

Name

Description

Unit

Type

groundcover_resource_total_counter

total amount of resource requests

Number

Counter

groundcover_resource_error_counter

total amount of requests with error status codes

Number

Counter

groundcover_resource_issue_counter

total amount of requests which were flagged as issues

Number

Counter

groundcover_resource_success_counter

total amount of resource requests with OK status codes

Number

Counter

groundcover_resource_latency_seconds

resource latency

Seconds

Summary

Workload metrics

Name

Description

Unit

Type

groundcover_workload_total_counter

total amount of requests handled by the workload

Number

Counter

groundcover_workload_error_counter

total amount of requests handled by the workload with error status codes

Number

Counter

groundcover_workload_issue_counter

total amount of requests handled by the workload which were flagged as issues

Number

Counter

groundcover_workload_success_counter

total amount of requests handled by the workload with OK status codes

Number

Counter

groundcover_workload_latency_seconds

resource latency across all of the workload APIs

Seconds

Summary

Kafka specific metrics

Name

Description

Unit

Type

groundcover_client_offset

client last message offset (for producer the last offset produced, for consumer the last requested offset)

Gauge

groundcover_workload_client_offset

client last message offset (for producer the last offset produced, for consumer the last requested offset), aggregated by workload

Gauge

groundcover_calc_lagged_messages

current lag in messages

Number

Gauge

groundcover_workload_calc_lagged_messages

current lag in messages, aggregated by workload

Number

Gauge

groundcover_calc_lag_seconds

current lag in time

Seconds

Gauge

groundcover_workload_calc_lag_seconds

current lag in time, aggregated by workload

Seconds

Gauge

Logs/Traces Sensitive Data Obfuscation

groundcover’s pipelines can be used to protect sensitive data in your Logs and Traces using Vector's redact function. Mask or remove sensitive information while preserving the usefulness of your data.

groundcover’s pipelines can be used to protect sensitive data in your Logs and Traces. With Vector's redact function, you can mask or remove sensitive information while preserving the usefulness of your observability data.

We highly recommend using Vector's built-in function redact for Logs/Traces obfuscation. This powerful function allows you to configure simple yet effective redaction rules to protect sensitive information in your logs and traces.

With redact, you can:

Mask or remove sensitive data from strings, arrays, or objects
Replace text matching specified patterns (like regex) with a placeholder, custom text, or a hash (SHA-2 or SHA-3)

Please refer to the redact function's documentation for more details.

On this page, we'll explore how to leverage the redact function and VRL's capabilities to obfuscate PII in Logs and Traces. At the end of this page, you'll find a handy list of regex patterns to save you time and effort.

Trace obfuscation can also be configured directly in the sensor. Can't find what you're looking for? Let us know over Slack.

Examples

In the examples below, we redact both the log contents (.content) and any attributes derived from the structued logs (.string_attributes).

Obfuscate credit card numbers from Logs

In this example, we'll obfuscate Visa credit card numbers from logs using the Visa credit card regex pattern from the library. By not specifying a redactor type, the redact function will default to full redaction, replacing detected numbers with the string “[REDACTED].”

vector:
  logsPipeline:
    extraSteps: 
    - name: obfuscateVisaCards
      transform:
        type: remap
        source: |-
          pattern = r'(?:4\d{3}){4}|4\d{7}\d{8}|4\d{12}(?:\d{3})?'
          
          # Redact content. Cast to string and redact
          contentAsString, err = string(.content)
          if err == null {
              .content = redact(contentAsString, filters: [pattern])
          }
          
          # Redact all attributes. Iterate, cast to string, redact
          .string_attributes = map_values(object!(.string_attributes), recursive:true) -> |value| { 
            asString, err = string(value)
            if err == null {
                redact(asString, filters: [pattern])
            } else {
                value
            }
          }

Here's an example of how Logs appear before and after obfuscation:

# Before
{
  "name": "Amit Shamun",
  "creditCard": {
    "issuer": "Visa",
    "number": "4111111111111111"
  }
}

# After
{
    "name": "Amit Shamun",
    "creditCard": {
        "issuer": "Visa",
        "number": "[REDACTED]"
    }
}

Hash US SSNs in Logs

In this example we’ll hash of all US Social Security Numbers hidden in logs. We’ll pass the sha2 parameter to the redactor to hash the sensitive values.

vector:
  logsPipeline:
    extraSteps: 
    - name: redactSSN
      transform:
        type: remap
        source: |- 
          # Redact content. Cast to string and redact
          contentAsString, err = string(.content)
          if err == null {
              .content = redact(contentAsString, filters: ["us_social_security_number"], redactor: {"type": "sha2", "variant": "SHA-256", "encoding": "base16"})
          }
          
          # Redact all attributes. Iterate, cast to string, redact
          .string_attributes = map_values(object!(.string_attributes), recursive:true) -> |value| { 
            asString, err = string(value)
            if err == null {
                redact(asString, filters: ["us_social_security_number"], redactor: {"type": "sha2", "variant": "SHA-256", "encoding": "base16"})
            } else {
                value
            }
          }

Here's how logs appear before and after obfuscation:

# Before
{
  "name": "Amit Shamun",
  "ssn": "123-45-6789"
}

# After
{
    "name": "Amit Shamun",
    "ssn": "01a54629efb952287e554eb23ef69c52097a75aecc0e3a93ca0855ab6d7a31a0"
}

Obfuscate IPs with Two Stages from Logs This example demonstrates how to obfuscate IP addresses in logs using a two-stage approach:

vector:
  logsPipeline:
    extraSteps: 
    - name: obfuscateIPv4
      transform:
        type: remap
        source: |-
          ipv4Pattern = r'\b(?:\d{1,3}\.){3}\d{1,3}\b'
          
          # Redact content. Cast to string and redact
          contentAsString, err = string(.content)
          if err == null {
              .content = redact(contentAsString, filters: [ipv4Pattern])
          }
          
          # Redact all attributes. Iterate, cast to string, redact
          .string_attributes = map_values(object!(.string_attributes), recursive:true) -> |value| { 
            asString, err = string(value)
            if err == null {
                redact(asString, filters: [ipv4Pattern])
            } else {
                value
            }
          }
  
    - name: obfuscateIPv6
      transform:
        type: remap
        source: |-
          ipv6Pattern = r'([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}'
          
          # Redact content. Cast to string and redact
          contentAsString, err = string(.content)
          if err == null {
              .content = redact(contentAsString, filters: [ipv6Pattern])
          }
          
          # Redact all attributes. Iterate, cast to string, redact
          .string_attributes = map_values(object!(.string_attributes), recursive:true) -> |value| { 
            asString, err = string(value)
            if err == null {
                redact(asString, filters: [ipv6Pattern])
            } else {
                value
            }
          }

Here's an example of how logs appear before and after this obfuscation:

# Before
{
    "server": "test-server",
    "ipv4": "1.2.3.4",
    "ipv6": "2001:0db8:85a3:0000:0000:8a2e:0370:7334",
}

# After
{
    "server": "test-server",
    "ipv4": "[REDACTED]",
    "ipv6": "[REDACTED]",
}

Patterns Library

Credit Card Scanners

Description

Pattern

Maestro Card (16 digits)

5[0678]\d{14}|5[0678]\d{7}\d{8}

Discover Card (16 digits)

(?:6(?:011|5\d{2}))\d{12}|6(?:011|5\d{2})\d{7}\d{8}

Diners Club (14 digits)

3(?:0[0-5]|[68]\d)\d{11}|3(?:0[0-5]|[68]\d)\d{4}\d{4}\d{2}|3(?:0[0-5]|[68]\d)\d{7}\d{6}

American Express (15 digits)

3[47]\d{2}\d{6}\d{5}|3[47]\d{2}\d{4}\d{4}\d{3}|3[47]\d{7}\d{6}|3[47]\d{13}

JCB Card (16 digits)

(?:2131|1800|35\d{3})\d{11}|(?:2131|1800|35\d{3})\d{7}\d{8}

MasterCard (16 digits)

(?:5[1-5]\d{2}){4}|5[1-5]\d{7}\d{8}|5[1-5]\d{14}

Visa Card (16 or 19 digits)

(?:4\d{3}){4}|4\d{7}\d{8}|4\d{12}(?:\d{3})?

API Key and Token Scanners

Description

Pattern

AWS Access Key ID and Secret Access Key

AKIA[0-9A-Z]{16}|(?:[A-Za-z0-9/+=]{40})

Google API Key and OAuth Access Token

AIza[0-9A-Za-z-]{35}|ya29.[0-9A-Za-z-]{35,}

Mailchimp API Key

(?i)(?:[a-f0-9]{32}-us\d{1,2})

Social Media Tokens (Facebook, Slack, Twitter, Instagram, LinkedIn)

EAACEdEose0cBA[0-9A-Za-z]{0,}|xox[baprs]-[0-9]{12}-[0-9]{12}-[a-zA-Z0-9]{24}|T[a-zA-Z0-9]{8}/B[a-zA-Z0-9]{8}/[a-zA-Z0-9]{24}|AAAAA[0-9A-Za-z]{35}|IGQVJW[0-9A-Za-z]{16,}|[A-Z0-9]{16}

Azure Personal Access Token

eyJ[0-9A-Za-z-_=]+

Azure SQL Connection String

Server=tcp:[A-Za-z0-9-.]+,1433;Database=[A-Za-z0-9-]+;User Id=[A-Za-z0-9-]+;Password=[A-Za-z0-9-]+;Encrypt=true;

Azure Subscription Key

Ocp-Apim-Subscription-Key: [A-Za-z0-9]{32}

GitHub Access Token and Refresh Token

ghp_[0-9A-Za-z]{36}

Shopify Access Token and Shared Secret

shpat_[A-Za-z0-9]{32}

Okta API Token

00[0-9A-Fa-f]{8}-[0-9A-Fa-f]{4}-[0-9A-Fa-f]{4}-[0-9A-Fa-f]{4}-[0-9A-Fa-f]{12}

JSON Web Token (JWT)

eyJ[0-9A-Za-z-*=]+\.[0-9A-Za-z-*=]+\.[A-Za-z0-9-_=]+

RSA Private Key

-----BEGIN RSA PRIVATE KEY-----[\s\S]+-----END RSA PRIVATE KEY-----

PGP Private Key

-----BEGIN PGP PRIVATE KEY BLOCK-----[\s\S]+-----END PGP PRIVATE KEY BLOCK-----

GitLab Token

glpat-[A-Za-z0-9-]{20}

Amazon Marketplace Web Services Auth Token

amzn\.mws\.[a-zA-Z0-9]{64}

Bearer Token

Bearer [A-Za-z0-9-*=]+\.[A-Za-z0-9-*=]+\.[A-Za-z0-9-_=]+

JIRA API Token

jira\.api\.token\.[A-Za-z0-9-_]+

Other Scanners

Description

Pattern

Standard Email Address

[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+

Standard IBAN Code

[A-Z]{2}\d{2}[A-Z0-9]{1,30}

Standard MAC Address

([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})

IPv4 Address

\b(?:\d{1,3}\.){3}\d{1,3}\b

IPv6 Address

([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}

HTTP(S) URL

https?:\/\/(?:www\.)?[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,6}(\/[a-zA-Z0-9\&%_\.\/~\-]*)?

HTTP Basic Authentication Header

Basic\s[a-zA-Z0-9=:_-]+

HTTP Cookie

Set-Cookie:\s[a-zA-Z0-9_\-]+=\S+

US Passport Number

[a-zA-Z]\d{9}

US Vehicle Identification Number (VIN)

[A-HJ-NPR-Z0-9]{17}

UK National Insurance Number

[A-CEGHJ-PR-TW-Za-ceghj-pr-tw-z]\d{2}\d{6}[A-Za-z]{1}

Canadian Social Insurance Number (SIN)

\d{3}-\d{3}-\d{3}

US Social Security Number (SSN)

\b\d{3}-\d{2}-\d{4}\b

Sensitive data obfuscation

As any application monitoring system, the data collected by groundcover is by nature sensitive and contains payloads of full requests and queries. Raw traces can go a long way in a troubleshooting process, but you can choose to obfuscate their payload.

By default groundcover does not obfuscate payloads. However, it will obfuscate sensitive HTTP and gRPC headers - see below for more information.

Configuration

Obfuscation is granularly defined separately for each protocol, using the following names:

httphandler
grpchandler
redishandler
sqlhandler
- This applies both for MySQL and PostgreSQL
mongodbhandler
amqphandler

Data obfuscation can be configured in two ways: key-value and unstructured.

Key-Value obfuscation

This method will automatically identify key-value structures such as JSON and query params, and for those it will perform obfuscation based on a defined set of keys.

The configuration consists of the following fields:

enabled - turns this obfuscator on and off. Default: false
mode - What should be done with values matching the specified keys. Possible modes are:
- KeepSpecificValues - Obfuscate all values except for keys specified in specificKeys
- ObfuscateSpecificValues - Keep all values and obfuscate only values for keys specified in specificKeys
caseSensitive - are the keys case sensitive. Default: False
specificKeys - a list of comma separated strings. Example:
```
        specificKeys: ["keep-me", "keep-me-too"]
```

If mode is not specified, the default behavior of this obfuscator is to obfuscate all keys, equivalent to:

mode: KeepSpecificValues
specificKeys: []

Obfuscation for nested JSON structures is based on the inner keys within the nested JSON objects. An example can be found at Obfuscation Examples

Below is an example of using the key-value configuration with different settings:

agent:
  sensor:
    httphandler:
      obfuscationConfig:
        keyValueConfig: 
          enabled: true
          mode: "KeepSpecificValues"
          specificKeys: ["keep-me"]
    mongodbhandler:
      obfuscationConfig:
        keyValueConfig: 
          enabled: true
          mode: "ObfuscateSpecificValues"
          caseSensitive: true # keys will be case sensitive
          specificKeys: ["obfuscate-me"]

Unstructured obfuscation

This method will obfucsate "free text" without any predefined rules. It is meant as a way to make sure all data is obfuscated regardless of its contents.

The configuration exists of the following fields:

Enabled - Turns this obfuscator on and off. Default: false

Below is an example of turning on the unstructured obfuscator:

agent:
  sensor:
    httphandler:
      obfuscationConfig:
        unstructuredConfig: 
          enabled: true

Combining the obfuscators

It's perfectly fine to use both the key-value and unstructured obfuscators together! When this is set, the key-value method will be executed first, and only if the structure isn't key-value, it will proceed to the unstructured method.

For example, let's look at a configuration for turning both obfuscators on:

agent:
  sensor:
    httphandler:
      obfuscationConfig:
        keyValueConfig: 
          enabled: true
          mode: "ObfuscateSpecificValues"
          specificKeys: ["key"]
        unstructuredConfig:
          enabled: true

Obfuscation Examples

JSON, {"key": "value"}
- {"key": "?"}
JSON with array, {"key": [1,2,3]}
- {"key": ["?", "?", "?"]}
JSON with nested keys, {"root": {"sub": {"key": "value"}}}
- {"root": {"sub": {"key": "?"}}}
key=value maps:
- key=?
Plain text plain text:
- p**** ****

Truncated data: if data has been truncated, it will not be obfuscated and will show scrubbed as the data. You can change the truncation size limits if you need to. Want to change your data truncation size limits? Contact us on slack.

Using CLI on New or Existing Installation

After you prepared your desired values.yaml, apply them using:

groundcover deploy --values values.yaml

Using Helm on New Installation

more on getting api-key, see: Using Helm

helm upgrade \
    groundcover \
    groundcover/groundcover \
    -n groundcover \
    -i \
    --set global.groundcover_token=<api-key>,clusterId=my_cluster
    --values values.yaml

Using Helm on Existing Installation

helm upgrade \
    groundcover \
    groundcover/groundcover \
    -n groundcover \
    --reuse-values \
    --values values.yaml

Sensitive headers obfuscation

groundcover will obfuscate sensitive HTTP and gRPC headers by default so that they are not shown in traces. This behavior is customizable using the same key value config as above.

The default values for the headers obfuscation are:

agent:
  sensor:
    sensitiveHeadersObfuscationConfig:
      enabled: true
      mode: "ObfuscateSpecificValues"
      specificKeys: ["Authorization", "Proxy-Authorization", "X-Amz-Security-Token", "X-Amz-Credential"]

According to the HTTP RFC, headers are case insensitive by nature. Because of that, the headers obfuscation will always be case insensitive and can't be configured otherwise.