Monitors

Monitors offers the ability to define custom alerts, which you can configure using groundcover data and custom metrics.

Overview of the Monitor Structure

A Monitor defines a set of rules and conditions that track the state of your system. When a monitor's conditions are met, it triggers an issue that is displayed on the Issues page and can be used for alerting using your integrations and workflows.

Key Components of the Monitor Structure

The Monitor structure is composed of several fields that define:

  1. What the monitor is checking.

  2. Scope / population of monitoring.

  3. Where the issue arises.

  4. Severity levels (i.e. criticality).

  5. Firing/Resolved states and the timeline of the issue.

Monitor fields explained

In this section, you'll find a breakdown of the key fields used to define and configure monitors within the groundcover platform. Each field plays a critical role in how a monitor behaves, what data it tracks, and how it responds to specific conditions. Understanding these fields will help you set up effective monitors to track performance, detect issues, and provide timely alerts.

Below is a detailed explanation of each field, along with examples to illustrate their usage, ensuring your team can manage and respond to incidents efficiently.

Field
Explanation
Example

Title

A string that defines the human-readable name of the Monitor. The title is what you will see in the list of all existing Monitors in the Monitors section.

Description

Additional information about the Monitor.

Severity

When triggered, this will show the severity level of the Monitor's issue.

s1 for Critical

s2 for High

s3 for Medium

s4 for Low

Header

A short string describing the condition that is being monitored. You can also use this as a pattern using labels from you query.

“HTTP API Error {labels.return_code}”

ResourceHeaderLabels

A list of labels that help you identify the resources that are related to the Monitor. This appear as a secondary header in all Issues tables across the platform.

["span_name", "kind"] for monitors on protocol issues.

ContextHeaderLabels

A list of contextual labels that help you identify the location of the issue. This appears as a subset of the Issue’s labels, and is displayed on all Issues tables across the platform.

["cluster", "namespace", "pod_name"]

Labels

A set of pre-defined labels that were set to Issues related to the selected Monitor. Labels can be static, or dynamic using a Monitor's query results.

team: sre_team , customer: {{ $values.query_name.Labels.customer }}

Annotations

Enables adding more context to your notifications, Runbook URLs, Summaries, etc.

ExecutionErrorState

Defines the actions that take place when a Monitor encouters query execution errors.

Valid options are Alerting, Normal and Error.

  • When Alerting is set, query execution errors will result in a firing issue.

  • When Error is set, query execution errors will result in an error state.

  • When Normal is set, query execution errors will do neither of the above.

NoDataState

This defines what happens when queries in the monitor return empty datasets.

Valid options are: NoData , Alerting, Normal

  • When NoData is set, issue instances state will be: No Data.

  • When Normal is set, issues instance state will be Pending. The will change to Alerting once the pending period of the monitor ends.

Interval

Defines how frequently the monitor evaluates the conditions. Common intervals could be 1m, 5m, etc.

PendingFor

Defines the period the threshold condition must be met to trigger the alert.

Trigger

Defines the condition under which the monitor fires. This is the definition of threshold for the monitor, with op - operator and value .

op: gt, value: 5

Model

Describes the queries, thresholds and data processing of the monitor. It can have the following fields:

  • Queries: List of one or more queries to run, this can be either SQL over ClickHouse, promsql over VictoriaMetrics, LogsQL (VictoriaMetrics) over logs/traces in ClickHouse. Each query will have a name for reference in the monitor, expression which is the query itself.

  • Reducers: List of reducers, which do rollup of the queries, each reducer will have a name and inputName which is the input query to run the rollup on. Each reducer has a type which is the aggregation type, one of: min, max, mean, median and sum. Reducers are not required since your query might already do the aggregation.

  • Thresholds: List of thresholds (there should be only one). This is the threshold of your monitor, each threshold has a name, inputName for data input, operator one of gt , lt , within_range, outside_range and array of values which are the threshold values.

Last updated