LogoLogo
Log in|Playground
  • Welcome
    • Introduction
    • FAQ
  • Capabilities
    • Log Management
    • Infrastructure Monitoring
    • Application Performance Monitoring (APM)
      • Application Metrics
      • Traces
      • Supported Technologies
    • Real User Monitoring (RUM)
  • Getting Started
    • Requirements
      • Kubernetes requirements
      • Kernel requirements for eBPF sensor
      • CPU architectures
      • ClickHouse resources
    • Installation & updating
    • Connect Linux hosts
    • Connect RUM
    • 5 quick steps to get you started
  • Use groundcover
    • Monitors
      • Create a new Monitor
      • Issues page
      • Monitor List page
      • Silences page
      • Monitor Catalog page
      • Monitor YAML structure
      • Embedded Grafana Alerts
        • Create a Grafana alert
    • Dashboards
      • Create a dashboard
      • Embedded Grafana Dashboards
        • Create a Grafana dashboard
        • Build alerts & dashboards with Grafana Terraform provider
        • Using groundcover datasources in a Self-hosted Grafana
    • Insights
    • Explore & Monitors query builder
    • Workflows
      • Create a new Workflow
      • Workflow Examples
      • Alert Structure
    • Search & Filter
    • Issues
    • Role-Based Access Control (RBAC)
    • Service Accounts
    • API Keys
    • Log Patterns
    • Drilldown
    • Scraping custom metrics
      • Operator based metrics
      • kube-state-metrics
      • cadvisor metrics
    • Backup & Restore Metrics
    • Metrics & Labels
    • Add custom environment labels
    • Configuring Pipelines
      • Writing Remap Transforms
      • Logs Pipeline Examples
      • Traces Pipeline Examples
      • Logs to Events Pipeline Examples
      • Logs/Traces Sensitive Data Obfuscation
      • Sensitive Data Obfuscation using OTTL
      • Log Filtering using OTTL
    • Querying your groundcover data
      • Query your logs
        • Example queries
        • Logs alerting
      • Query your metrics
      • Querying you data using an API
      • Using KEDA autoscaler with groundcover
  • Log Parsing with OpenTelemetry Pipelines
  • Log and Trace Correlation
  • RUM
  • Customization
    • Customize deployment
      • Agents in host network mode
      • API Key Secret
      • Argo CD
      • On-premise deployment
      • Quay.io registry
      • Configuring sensor deployment coverage
      • Enabling SSL Tracing in Java Applications
    • Customize usage
      • Filtering Kubernetes entities
      • Custom data retention
      • Sensitive data obfuscation
      • Custom storage
      • Custom logs collection
      • Custom labels and annotations
        • Enrich logs and traces with pod labels & annotations
        • Enrich metrics with node labels
      • Disable tracing for specific protocols
      • Tuning resources
      • Controlling the eBPF sampling mechanism
  • Integrations
    • Overview
    • Workflow Integrations
      • Slack Webhook Integration
      • Opsgenie Integration
      • Webhook Integration
        • Incident.io
      • PagerDuty Integration
      • Jira Webhook Integration
    • Data sources
      • OpenTelemetry
        • Traces & Logs
        • Metrics
      • Istio
      • AWS
        • Ingest CloudWatch Metrics
        • Ingest CloudWatch Logs
        • Ingest Logs Stored on S3
        • Integrate CloudWatch Grafana Datasource
      • GCP
        • Ingest Google Cloud Monitoring Metrics
        • Stream Logs using Pub/Sub
        • Integrate Google Cloud Monitoring Grafana Datasource
      • Azure
        • Ingest Azure Monitor Metrics
      • DataDog
        • Traces
        • Metrics
      • FluentBit
      • Fluentd
      • JSON Logs
    • 3rd-party metrics
      • ActiveMQ
      • Aerospike
      • Cassandra
      • CloudFlare
      • Consul
      • CoreDNS
      • Etcd
      • HAProxy
      • Harbor
      • JMeter
      • K6
      • Loki
      • Nginx
      • Pi-hole
      • Postfix
      • RabbitMQ
      • Redpanda
      • SNMP
      • Solr
      • Tomcat
      • Traefik
      • Varnish
      • Vertica
      • Zabbix
    • Source control (Gitlab/Github)
  • Architecture
    • Overview
    • inCloud Managed
      • Setup inCloud Managed with AWS
        • AWS PrivateLink Setup
        • EKS add-on
      • Setup inCloud Managed with GCP
      • Setup inCloud Managed with Azure
      • High Availability
      • Disaster Recovery
      • Ingestion Endpoints
      • Deploying in Sensor-Only mode
    • Security considerations
      • Okta SSO - onboarding
    • Service endpoints inside the cluster
  • Product Updates
    • What's new?
    • Earlier updates
      • 2025
        • Mar 2025
        • Feb 2025
        • Jan 2025
      • 2024
        • Dec 2024
        • Nov 2024
        • Oct 2024
        • Sep 2024
        • Aug 2024
        • July 2024
        • May 2024
        • Apr 2024
        • Mar 2024
        • Feb 2024
        • Jan 2024
      • 2023
        • Dec 2023
        • Nov 2023
        • Oct 2023
Powered by GitBook
On this page
  • Monitor fields explained
  • Monitor YAML Examples
  • Traces Based Monitors
  • Log Based Monitors
Export as PDF
  1. Use groundcover
  2. Monitors

Monitor YAML structure

Last updated 2 months ago

While we strongly suggest building monitors using our or , groundcover supports building and editing your Monitors using YAML. If you choose to do so, the following will provide you the necessary definitions.

Monitor fields explained

In this section, you'll find a breakdown of the key fields used to define and configure Monitors within the groundcover platform. Each field plays a critical role in how a Monitor behaves, what data it tracks, and how it responds to specific conditions. Understanding these fields will help you set up effective Monitors to track performance, detect issues, and provide timely alerts.

Below is a detailed explanation of each field, along with examples to illustrate their usage, ensuring your team can manage and respond to incidents efficiently.

Field
Explanation
Example

Title

A string that defines the human-readable name of the Monitor. The title is what you will see in the list of all existing Monitors in the Monitors section.

Description

Additional information about the Monitor.

Severity

When triggered, this will show the severity level of the Monitor's issue. You can set any severity you want here.

s1 for Critical

s2 for High

s3 for Medium

s4 for Low

Header

This is the header of the generated issues from the Monitor.

A short string describing the condition that is being monitored. You can also use this as a pattern using labels from you query.

“HTTP API Error {{ alert.labels.return_code}}”

ResourceHeaderLabels

A list of labels that help you identify the resources that are related to the Monitor. This appear as a secondary header in all Issues tables across the platform.

["span_name", "kind"] for monitors on protocol issues.

ContextHeaderLabels

A list of contextual labels that help you identify the location of the issue. This appears as a subset of the Issue’s labels, and is displayed on all Issues tables across the platform.

["cluster", "namespace", "pod_name"]

Labels

A set of pre-defined labels that were set to Issues related to the selected Monitor. Labels can be static, or dynamic using a Monitor's query results.

team: sre_team

ExecutionErrorState

Defines the actions that take place when a Monitor encounters query execution errors.

Valid options are Alerting, OK and Error.

  • When Alerting is set, query execution errors will result in a firing issue.

  • When Error is set, query execution errors will result in an error state.

  • When OK is set, query execution errors will do neither of the above. This is the default setting

NoDataState

This defines what happens when queries in the Monitor return empty datasets.

Valid options are: NoData , Alerting, OK

  • When NoData is set, issue instances state will be: No Data.

  • When OK is set, issues instance state will be Pending. The will change to Alerting once the pending period of the monitor ends. This is the dafault setting

Interval

Defines how frequently the Monitor evaluates the conditions. Common intervals could be 1m, 5m, etc.

PendingFor

Defines the period of consecutive intervals where threshold condition must be met to trigger the alert.

Trigger

Defines the condition under which the Monitor fires. This is the definition of threshold for the Monitor, with op - operator and value .

op: gt, value: 5

Model

Describes the queries, thresholds and data processing of the Monitor. It can have the following fields:

  • Queries: List of one or more queries to run, this can be either SQL over ClickHouse, PromQL over VictoriaMetrics, SqlPipeline. Each query will have a name for reference in the monitor.

  • Thresholds: This is the threshold of your Monitor, a threshold has a name, inputName for data input, operator one of gt , lt , within_range, outside_range and array of values which are the threshold values.

measurementType

Describe how will we present issues of this Monitor. Some Monitors count events, and some a state. And we will display them differently in our dashboards.

  • state - Will present issues in line chart.

  • event - Will present issues in bar chart, counting events.

Monitor YAML Examples

Traces Based Monitors

MySQL Query Errors Monitor

title: MySQL Query Errors Monitor
display:
  header: MySQL Error {{ alert.labels.statusCode }}
  description: This monitor detects MySQL Query errors.
  resourceHeaderLabels:
    - span_name
    - role
  contextHeaderLabels:
    - cluster
    - namespace
    - workload
severity: S3
measurementType: event
model:
  queries:
    - name: threshold_input_query
      dataType: traces
      sqlPipeline:
        selectors:
          - key: _time
            origin: root
            type: string
            processors:
              - op: toStartOfInterval
                args:
                  - 1 minutes
            alias: bucket_timestamp
          - key: statusCode
            origin: root
            type: string
            alias: statusCode
          - key: span_name
            origin: root
            type: string
            alias: span_name
          - key: cluster
            origin: root
            type: string
            alias: cluster
          - key: namespace
            origin: root
            type: string
            alias: namespace
          - key: role
            origin: root
            type: string
            alias: role
          - key: workload
            origin: root
            type: string
            alias: workload
          - key: "*"
            origin: root
            type: string
            processors:
              - op: count
            alias: logs_total
        groupBy:
          - key: _time
            origin: root
            type: string
            processors:
              - op: toStartOfInterval
                args:
                  - 1 minutes
          - key: statusCode
            origin: root
            type: string
            alias: statusCode
          - key: span_name
            origin: root
            type: string
            alias: span_name
          - key: cluster
            origin: root
            type: string
            alias: cluster
          - key: namespace
            origin: root
            type: string
            alias: namespace
          - key: role
            origin: root
            type: string
            alias: role
          - key: workload
            origin: root
            type: string
            alias: workload
        orderBy:
          - selector:
              key: bucket_timestamp
              origin: root
              type: string
            direction: ASC
        limit:
        filters:
          operator: and
          conditions:
            - filters:
                - op: match
                  value: mysql
              key: eventType
              origin: root
              type: string
            - filters:
                - op: match
                  value: error
              key: status
              origin: root
              type: string
            - filters:
                - op: match
                  value: eBPF
              key: source
              origin: root
              type: string
      instantRollup: 1 minutes
  thresholds:
    - name: threshold_1
      inputName: threshold_input_query
      operator: gt
      values:
        - 0
executionErrorState: OK
noDataState: OK
evaluationInterval:
  interval: 1m
  pendingFor: 0s
labels:
  team: infra

gRPC API Errors Monitor

title: gRPC API Errors Monitor
display:
  header: gRPC API Error {{ alert.labels.statusCode }}
  description: This monitor detects gRPC API errors by identifying responses with a non-zero status code.
  resourceHeaderLabels:
    - span_name
    - role
  contextHeaderLabels:
    - cluster
    - namespace
    - workload
severity: S3
measurementType: event
model:
  queries:
    - name: threshold_input_query
      dataType: traces
      sqlPipeline:
        selectors:
          - key: _time
            origin: root
            type: string
            processors:
              - op: toStartOfInterval
                args:
                  - 1 minutes
            alias: bucket_timestamp
          - key: statusCode
            origin: root
            type: string
            alias: statusCode
          - key: span_name
            origin: root
            type: string
            alias: span_name
          - key: cluster
            origin: root
            type: string
            alias: cluster
          - key: namespace
            origin: root
            type: string
            alias: namespace
          - key: role
            origin: root
            type: string
            alias: role
          - key: workload
            origin: root
            type: string
            alias: workload
          - key: "*"
            origin: root
            type: string
            processors:
              - op: count
            alias: logs_total
        groupBy:
          - key: _time
            origin: root
            type: string
            processors:
              - op: toStartOfInterval
                args:
                  - 1 minutes
          - key: statusCode
            origin: root
            type: string
            alias: statusCode
          - key: span_name
            origin: root
            type: string
            alias: span_name
          - key: cluster
            origin: root
            type: string
            alias: cluster
          - key: namespace
            origin: root
            type: string
            alias: namespace
          - key: role
            origin: root
            type: string
            alias: role
          - key: workload
            origin: root
            type: string
            alias: workload
        orderBy:
          - selector:
              key: bucket_timestamp
              origin: root
              type: string
            direction: ASC
        limit:
        filters:
          operator: and
          conditions:
            - filters:
                - op: match
                  value: grpc
              key: eventType
              origin: root
              type: string
            - filters:
                - op: ne
                  value: "0"
              key: statusCode
              origin: root
              type: string
            - filters:
                - op: match
                  value: error
              key: status
              origin: root
              type: string
            - filters:
                - op: match
                  value: eBPF
              key: source
              origin: root
              type: string
      instantRollup: 1 minutes
  thresholds:
    - name: threshold_1
      inputName: threshold_input_query
      operator: gt
      values:
        - 0
executionErrorState: OK
noDataState: OK
evaluationInterval:
  interval: 1m
  pendingFor: 0s

Log Based Monitors

High Error Log Rate Monitor

title: High Error Log Rate Monitor
severity: S4
display:
  header: High Log Error Rate
  description: This monitor will trigger an alert when we have a rate of error logs.
  resourceHeaderLabels:
    - workload
  contextHeaderLabels:
    - cluster
    - namespace
evaluationInterval:
  interval: 1m
  pendingFor: 0s
model:
  queries:
    - name: threshold_input_query
      dataType: logs
      sqlPipeline:
        selectors:
          - key: _time
            origin: root
            type: string
            processors:
              - op: toStartOfInterval
                args:
                  - 1 minutes
            alias: bucket_timestamp
          - key: workload
            origin: root
            type: string
            alias: workload
          - key: namespace
            origin: root
            type: string
            alias: namespace
          - key: cluster
            origin: root
            type: string
            alias: cluster
          - key: "*"
            origin: root
            type: string
            processors:
              - op: count
            alias: logs_total
        groupBy:
          - key: _time
            origin: root
            type: string
            processors:
              - op: toStartOfInterval
                args:
                  - 1 minutes
          - key: workload
            origin: root
            type: string
            alias: workload
          - key: namespace
            origin: root
            type: string
            alias: namespace
          - key: cluster
            origin: root
            type: string
            alias: cluster
        orderBy:
          - selector:
              key: bucket_timestamp
              origin: root
              type: string
            direction: ASC
        limit:
        filters:
          conditions:
            - filters:
                - op: match
                  value: error
              key: level
              origin: root
              type: string
          operator: and
      instantRollup: 1 minutes
  thresholds:
    - name: threshold_1
      inputName: threshold_input_query
      operator: gt
      values:
        - 150
noDataState: OK
measurementType: event
Catalog
Wizard