Monitors
Monitors offers the ability to define custom alerts, which you can configure using groundcover data and custom metrics.
Overview of the Monitor Structure
A Monitor
defines a set of rules and conditions that track the state of your system. When a monitor's conditions are met, it triggers an issue that is displayed on the Issues page and can be used for alerting using your integrations and workflows.
Key Components of the Monitor Structure
The Monitor
structure is composed of several fields that define:
What the monitor is checking.
Scope / population of monitoring.
Where the issue arises.
Severity levels (i.e. criticality).
Firing/Resolved states and the timeline of the issue.
Monitor fields explained
In this section, you'll find a breakdown of the key fields used to define and configure monitors within the groundcover platform. Each field plays a critical role in how a monitor behaves, what data it tracks, and how it responds to specific conditions. Understanding these fields will help you set up effective monitors to track performance, detect issues, and provide timely alerts.
Below is a detailed explanation of each field, along with examples to illustrate their usage, ensuring your team can manage and respond to incidents efficiently.
Title
A string that defines the human-readable name of the Monitor. The title is what you will see in the list of all existing Monitors in the Monitors section.
Description
Additional information about the Monitor.
Severity
When triggered, this will show the severity level of the Monitor's issue.
s1
for Critical
s2
for High
s3
for Medium
s4
for Low
Header
A short string describing the condition that is being monitored. You can also use this as a pattern using labels from you query.
“HTTP API Error {labels.return_code}”
ResourceHeaderLabels
A list of labels that help you identify the resources that are related to the Monitor. This appear as a secondary header in all Issues tables across the platform.
["span_name", "kind"]
for monitors on protocol issues.
ContextHeaderLabels
A list of contextual labels that help you identify the location of the issue. This appears as a subset of the Issue’s labels, and is displayed on all Issues tables across the platform.
["cluster", "namespace", "pod_name"]
Labels
A set of pre-defined labels that were set to Issues related to the selected Monitor. Labels can be static, or dynamic using a Monitor's query results.
team: sre_team
, customer: {{ $values.query_name.Labels.customer }}
Annotations
Enables adding more context to your notifications, Runbook URLs, Summaries, etc.
ExecutionErrorState
Defines the actions that take place when a Monitor encouters query execution errors.
Valid options are Alerting
, Normal
and Error.
When
Alerting
is set, query execution errors will result in a firing issue.When
Error
is set, query execution errors will result in an error state.When
Normal
is set, query execution errors will do neither of the above.
NoDataState
This defines what happens when queries in the monitor return empty datasets.
Valid options are: NoData
, Alerting
, Normal
When
NoData
is set, issue instances state will be:No Data
.When
Normal
is set, issues instance state will bePending
. The will change toAlerting
once the pending period of the monitor ends.
Interval
Defines how frequently the monitor evaluates the conditions. Common intervals could be 1m
, 5m
, etc.
PendingFor
Defines the period the threshold condition must be met to trigger the alert.
Trigger
Defines the condition under which the monitor fires. This is the definition of threshold for the monitor, with op
- operator and value
.
op: gt, value: 5
Model
Describes the queries, thresholds and data processing of the monitor. It can have the following fields:
Queries: List of one or more queries to run, this can be either SQL over ClickHouse, promsql over VictoriaMetrics, LogsQL (VictoriaMetrics) over logs/traces in ClickHouse. Each query will have a name for reference in the monitor, expression which is the query itself.
Reducers: List of reducers, which do rollup of the queries, each reducer will have a name and inputName which is the input query to run the rollup on. Each reducer has a type which is the aggregation type, one of:
min
,max
,mean
,median
andsum
. Reducers are not required since your query might already do the aggregation.Thresholds: List of thresholds (there should be only one). This is the threshold of your monitor, each threshold has a name, inputName for data input, operator one of
gt
,lt
,within_range
,outside_range
and array of values which are the threshold values.
Last updated