# Monitor YAML structure

While we strongly suggest building monitors using our [Wizard](/use-groundcover/monitors/create-a-new-monitor.md#using-the-monitor-wizard) or [Catalog](/use-groundcover/monitors/monitor-catalog-page.md), groundcover also supports building and editing monitors directly in YAML. This page documents the current schema.

For the query language used inside monitor queries, see the [gcQL Reference](/use-groundcover/querying-your-groundcover-data/groundcover-query-language/groundcover-query-language-gcql-reference.md). For ClickHouse SQL escape hatch monitors, see [SQL Based Monitors](/use-groundcover/monitors/sql-based-monitors.md).

## Top-level fields

<table data-full-width="true"><thead><tr><th width="220">Field</th><th>Description</th><th width="180">Allowed values</th></tr></thead><tbody><tr><td><strong>title</strong> <em>(required)</em></td><td>Human-readable name of the monitor. Shown in the Monitor List.</td><td>string</td></tr><tr><td><strong>display</strong></td><td>Display settings controlling how issues from this monitor are rendered. See <a href="#display">Display</a>.</td><td>object</td></tr><tr><td><strong>severity</strong></td><td>Severity reported on firing issues.</td><td><code>S1</code>, <code>S2</code>, <code>S3</code>, <code>S4</code></td></tr><tr><td><strong>measurementType</strong></td><td>Controls how issues are visualized. <code>state</code> renders as a line chart; <code>event</code> renders as a bar chart counting events.</td><td><code>state</code>, <code>event</code></td></tr><tr><td><strong>model</strong> <em>(required)</em></td><td>Queries, reducers, and thresholds that define what the monitor evaluates. See <a href="#model">Model</a>.</td><td>object</td></tr><tr><td><strong>labels</strong></td><td>Static or templated labels attached to the issue. Values can reference query results via <code>{{ $values.&#x3C;threshold_name>.Labels.&#x3C;key> }}</code>.</td><td><code>map&#x3C;string,string></code></td></tr><tr><td><strong>annotations</strong></td><td>Annotations attached to the alert, often used to wire monitors into workflows.</td><td><code>map&#x3C;string,string></code></td></tr><tr><td><strong>category</strong></td><td>Free-form category used for grouping in the Monitor List.</td><td>string</td></tr><tr><td><strong>executionErrorState</strong></td><td>State when query execution fails. <code>OK</code> suppresses; <code>Error</code> raises an error state; <code>Alerting</code> fires an issue.</td><td><code>OK</code>, <code>Error</code>, <code>Alerting</code></td></tr><tr><td><strong>noDataState</strong></td><td>State when the query returns no rows. <code>OK</code> stays normal; <code>NoData</code> enters a No Data state; <code>Alerting</code> fires.</td><td><code>OK</code>, <code>NoData</code>, <code>Alerting</code></td></tr><tr><td><strong>evaluationInterval</strong></td><td>Evaluation cadence and pending window. See <a href="#evaluationinterval">EvaluationInterval</a>.</td><td>object</td></tr><tr><td><strong>notificationSettings</strong></td><td>How alerts are delivered. See <a href="#notificationsettings">NotificationSettings</a>.</td><td>object</td></tr><tr><td><strong>isPaused</strong></td><td>When true, the monitor is defined but not evaluated.</td><td>boolean</td></tr></tbody></table>

## display

<table data-full-width="true"><thead><tr><th width="260">Field</th><th>Description</th></tr></thead><tbody><tr><td><strong>header</strong></td><td>Template for the issue header. Supports alert label substitution, e.g. <code>"gRPC API Error {{ labels.status_code }}"</code>.</td></tr><tr><td><strong>description</strong></td><td>Template for the issue description. Supports the same substitutions as <code>header</code>.</td></tr><tr><td><strong>resourceHeaderLabels</strong></td><td>List of labels identifying the <em>resource</em> the issue relates to. Rendered as the secondary header across Issues tables. Example: <code>["span_name", "role"]</code>.</td></tr><tr><td><strong>contextHeaderLabels</strong></td><td>List of labels identifying the <em>location</em> of the issue. Rendered as a subset of the issue's labels. Example: <code>["cluster", "namespace", "workload"]</code>.</td></tr><tr><td><strong>templateLanguage</strong></td><td>Template engine for <code>header</code> and <code>description</code>. Set to <code>jinja2</code> to opt into Jinja2 syntax (enablesblocks and filters). Omit for the default Go-template syntax.</td></tr></tbody></table>

## model

<table data-full-width="true"><thead><tr><th width="200">Field</th><th>Description</th></tr></thead><tbody><tr><td><strong>queries</strong> <em>(required)</em></td><td>One or more queries that produce the data the monitor evaluates. See <a href="#modelqueries">model.queries</a>.</td></tr><tr><td><strong>reducers</strong></td><td>Aggregations applied on top of queries before thresholds run. See <a href="#modelreducers">model.reducers</a>.</td></tr><tr><td><strong>thresholds</strong></td><td>Conditions evaluated against a query or reducer output. See <a href="#modelthresholds">model.thresholds</a>.</td></tr></tbody></table>

### model.queries

Each query describes one data source and one expression. The combination of `dataType` and the query body determines which engine runs the query.

<table data-full-width="true"><thead><tr><th width="220">Field</th><th>Description</th></tr></thead><tbody><tr><td><strong>name</strong> <em>(required)</em></td><td>Identifier used by reducers and thresholds to reference this query's output.</td></tr><tr><td><strong>dataType</strong></td><td>Data source to query. One of <code>logs</code>, <code>traces</code>, <code>events</code>, <code>rum</code>, <code>entities</code>, <code>issues</code>, <code>metrics</code>, <code>infra</code>.</td></tr><tr><td><strong>expression</strong></td><td><p>The query itself. The language depends on <code>dataType</code>:</p><ul><li><strong>gcQL</strong> for <code>logs</code>, <code>traces</code>, <code>events</code>, <code>rum</code>, <code>entities</code>, <code>issues</code>. See the <a href="/pages/0rYEo6UxOzVzcXriFK6Y">gcQL Reference</a>.</li><li><a href="https://docs.victoriametrics.com/metricsql/"><strong>MetricsQL</strong></a> for <code>metrics</code> and <code>infra</code>. MetricsQL is VictoriaMetrics' query language and is backwards-compatible with PromQL, with extra functions (<code>topk_last</code>, <code>rollup_rate</code>, etc.). It does <em>not</em> use the pipe (<code>|</code>) operator; combine operations with nested functions or arithmetic.</li></ul></td></tr><tr><td><strong>datasourceType</strong></td><td>Required for MetricsQL queries. Set to <code>prometheus</code> (the name refers to the Prometheus-compatible API served by the metrics backend).</td></tr><tr><td><strong>queryType</strong></td><td>For MetricsQL queries, use <code>instant</code>.</td></tr><tr><td><strong>filters</strong></td><td>Optional standalone gcQL filter expression, used alongside <code>sqlPipeline</code>. For modern monitors, filters belong directly inside <code>expression</code>.</td></tr><tr><td><strong>relativeTimerange</strong></td><td>Time window relative to evaluation time. Object with <code>from</code> and optional <code>to</code> durations (e.g. <code>from: 5m</code>).</td></tr><tr><td><strong>instantRollup</strong></td><td>Bucket size for <strong>gcQL</strong> queries (logs/traces/events/rum/entities/issues), e.g. <code>1 minutes</code>, <code>5 minutes</code>. Controls the time granularity the monitor evaluates over.</td></tr><tr><td><strong>rollup</strong></td><td>Server-side rollup for <strong>MetricsQL</strong> queries (metrics/infra). Object with <code>function</code> (<code>avg</code>, <code>max</code>, <code>min</code>, <code>sum</code>, <code>count</code>, <code>stddev</code>, <code>stdvar</code>, <code>last</code>) and <code>time</code> (duration).</td></tr><tr><td><strong>sqlPipeline</strong></td><td><em>Legacy.</em> Structured SQL pipeline object. Still accepted for backwards compatibility but new monitors should use <code>expression</code> with gcQL.</td></tr></tbody></table>

{% hint style="info" %}
**`rollup` vs `instantRollup`** is a common point of confusion. They are two different fields for two different query engines: use `rollup` (object with `function` and `time`) for MetricsQL queries, and `instantRollup` (a duration string) for gcQL queries. They are not interchangeable.
{% endhint %}

### model.reducers

Reducers aggregate a query's output into a single value (or per-group value) before thresholds run. This is how you turn a timeseries into a single number to compare against a threshold.

<table data-full-width="true"><thead><tr><th width="200">Field</th><th>Description</th></tr></thead><tbody><tr><td><strong>name</strong> <em>(required)</em></td><td>Identifier used by thresholds.</td></tr><tr><td><strong>inputName</strong></td><td>Name of the query (or another reducer) to read from. Required unless <code>type: math</code>.</td></tr><tr><td><strong>type</strong> <em>(required)</em></td><td>One of <code>last</code>, <code>min</code>, <code>max</code>, <code>mean</code>, <code>sum</code>, <code>count</code>, <code>math</code>.</td></tr><tr><td><strong>expression</strong></td><td>Required when <code>type: math</code>. An arithmetic expression over reducer outputs, e.g. <code>$errors / $total * 100</code>.</td></tr><tr><td><strong>relativeTimerange</strong></td><td>Optional time window specific to this reducer.</td></tr></tbody></table>

### model.thresholds

Thresholds are the final condition that determines whether the monitor fires.

<table data-full-width="true"><thead><tr><th width="200">Field</th><th>Description</th></tr></thead><tbody><tr><td><strong>name</strong> <em>(required)</em></td><td>Identifier for this threshold.</td></tr><tr><td><strong>inputName</strong> <em>(required)</em></td><td>Name of the query or reducer this threshold evaluates.</td></tr><tr><td><strong>operator</strong> <em>(required)</em></td><td>One of <code>gt</code>, <code>lt</code>, <code>gte</code>, <code>lte</code>, <code>eq</code>, <code>neq</code>, <code>within_range</code>, <code>outside_range</code>, <code>within_range_included</code>, <code>outside_range_included</code>.</td></tr><tr><td><strong>values</strong> <em>(required)</em></td><td>Array of numbers. One value for comparison operators; two values for <code>within_range</code> / <code>outside_range</code> and their inclusive variants.</td></tr><tr><td><strong>relativeTimerange</strong></td><td>Optional time window specific to this threshold.</td></tr></tbody></table>

## evaluationInterval

<table data-full-width="true"><thead><tr><th width="200">Field</th><th>Description</th></tr></thead><tbody><tr><td><strong>interval</strong></td><td>How often the monitor is evaluated, e.g. <code>1m</code>, <code>5m</code>.</td></tr><tr><td><strong>pendingFor</strong></td><td>Duration the threshold must hold before the issue transitions from Pending to Alerting. Use <code>0s</code> to fire immediately.</td></tr></tbody></table>

## notificationSettings

<table data-full-width="true"><thead><tr><th width="240">Field</th><th>Description</th></tr></thead><tbody><tr><td><strong>method</strong></td><td>How notifications are delivered. <code>notificationRoutes</code> uses the matching routes defined in <a href="/pages/iJQsKADR1EaNDH0bsmEp">Notification Routes</a>; <code>connectedApps</code> sends directly to the apps listed in <code>connectedApps</code>, bypassing routes; <code>noNotifications</code> suppresses all notifications for this monitor.</td></tr><tr><td><strong>connectedApps</strong></td><td>List of connected app IDs. Used with <code>method: connectedApps</code>.</td></tr><tr><td><strong>renotificationInterval</strong></td><td>Duration between repeat notifications while the issue remains firing, e.g. <code>4h</code>.</td></tr><tr><td><strong>disableRenotification</strong></td><td>When true, suppresses repeat notifications.</td></tr><tr><td><strong>statusFilters</strong></td><td>List of issue statuses that trigger notifications. Used with <code>method: connectedApps</code>.</td></tr></tbody></table>

## Examples

### Traces monitor (gcQL)

Fires when gRPC traces return a non-zero status code.

```yaml
title: gRPC API Errors Monitor
display:
  header: gRPC API Error {{ labels.status_code }}
  description: |-
    This monitor detects gRPC API errors by identifying responses with a status code indicating failure.
    Cluster: {{ labels.cluster }}
    Namespace: {{ labels.namespace }}
    Workload: {{ labels.workload }}
    Span: {{ labels.span_name }}
  resourceHeaderLabels:
    - span_name
    - role
  contextHeaderLabels:
    - env
    - cluster
    - namespace
    - workload
  templateLanguage: jinja2
severity: S3
measurementType: event
model:
  queries:
    - name: threshold_input_query
      dataType: traces
      expression: >
        span_type:grpc status_code:!=0 status:error source:ebpf
        | stats by (env, cluster, namespace, workload, status_code, span_name, role) count() errors_total
      instantRollup: 1 minutes
  thresholds:
    - name: threshold_1
      inputName: threshold_input_query
      operator: gt
      values:
        - 0
executionErrorState: OK
noDataState: OK
evaluationInterval:
  interval: 1m
  pendingFor: 0s
```

### Logs monitor (gcQL)

Fires when sensor logs contain panic or fatal errors.

```yaml
title: Sensor Panic / Fatal Errors
display:
  header: Sensor Panic or Fatal Errors
  description: |-
    Detects panic or fatal errors in sensor logs.
    Cluster: {{ labels.cluster }}
    Namespace: {{ labels.namespace }}
    Pod: {{ labels.pod }}
  contextHeaderLabels:
    - pod
    - workload
    - cluster
    - env
    - namespace
severity: S2
measurementType: event
model:
  queries:
    - name: threshold_input_query
      dataType: logs
      expression: >
        container:sensor level:in(panic, fatal)
        | stats by (env, cluster, namespace, workload, pod) count() count_all_result
      instantRollup: 5 minutes
  thresholds:
    - name: threshold_1
      inputName: threshold_input_query
      operator: gt
      values:
        - 0
executionErrorState: OK
noDataState: OK
evaluationInterval:
  interval: 5m
  pendingFor: 0s
notificationSettings:
  renotificationInterval: 4h
```

### Metrics monitor (MetricsQL)

Fires when a Kubernetes pod is in `CrashLoopBackOff` for more than 5 minutes. Uses a reducer to collapse the timeseries before the threshold runs.

{% hint style="info" %}
Metrics queries use [MetricsQL](https://docs.victoriametrics.com/metricsql/) (PromQL-compatible). kube-state-metrics names are prefixed with `groundcover_`, and the node identity label is `node_name` (not `node`).
{% endhint %}

```yaml
title: K8s Pod Crash Looping Monitor
display:
  header: K8s Pod Crash Looping
  description: Kubernetes pod has been in CrashLoopBackOff for more than 5 minutes.
  resourceHeaderLabels:
    - workload
  contextHeaderLabels:
    - env
    - cluster
    - namespace
severity: S2
measurementType: state
model:
  queries:
    - name: crash_looping_query
      dataType: metrics
      expression: >
        avg_over_time(
          avg by (env, cluster, namespace, workload) (
            groundcover_kube_pod_container_status_waiting_reason{reason="CrashLoopBackOff"}
          )[5m]
        )
      datasourceType: prometheus
      queryType: instant
      rollup:
        function: avg
        time: 5m
  reducers:
    - name: crash_looping_mean
      inputName: crash_looping_query
      type: mean
  thresholds:
    - name: threshold_1
      inputName: crash_looping_mean
      operator: gt
      values:
        - 0
executionErrorState: OK
noDataState: OK
evaluationInterval:
  interval: 1m
  pendingFor: 5m
```

### ClickHouse SQL monitor

For advanced cases that need joins, CTEs, or comparisons across time windows, you can drop to ClickHouse SQL. See [SQL Based Monitors](/use-groundcover/monitors/sql-based-monitors.md) for details and examples.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.groundcover.com/use-groundcover/monitors/monitor-yaml-structure.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
