SQL Based Monitors
Sometimes there are use cases that involve complex queries and conditions for triggering a monitor. This might go beyond the built-in query logic that is provided within the groundcover logs page query language.
An example for such a use case could be the need to compare some logs to the same ones in a past period. This is not something that is regularly available for log search but can definitely be something to alert on. If the number of errors for a group of logs dramatically changes from a previous week, this could be an event to alert and investigate.
For such use cases you can harness the powerful ClickHouse SQL language to create an SQL based monitor within groundcover.
ClickHouse within groundcover
Whether you use groundcover as a managed InCloud or deployed directly on your cluster, Log and Trace telemetry data is stored within a ClickHouse database.
You can directly query this data using SQL statements and create powerful monitors.
To create and test your SQL queries use the Grafana Explore page within the groundcover app.
Select the ClickHouse@groundcover datasource with the SQL Editor option to start crafting your SQL queries

Start with show tables;
to see of all the available tables to use for your queries: logs
and traces
would be popular choices (table names are case sensitive).
Query best practices
While testing your queries always use LIMIT to limit your results to a small set of data.
SELECT * FROM logs LIMIT 10;
To apply the Grafana timeframe on your queries make sure to add the following conditions:
Logs: WHERE $__timeFilter(timestamp)
Traces: WHERE $__timeFilter(start_timestamp)
Note: When querying logs with SQL, it's crucial to use efficient filters to prevent timeouts and enhance performance. Implementing primary filters like cluster
, workload
, namespace
, and env
will significantly speed up queries. Always integrate these filters when writing your queries to avoid inefficient queries.
Filtering on attributes and tags
Traces and Logs have rich context that is normally stored in dedicated columns in json format. Accessing the context for filtering and retrieving values is a popular need when querying the data.
To get to the relevant context item, either in the attributes or tags you can use the following syntax:
WHERE string_attributes['host_name'] = 'my.host'
WHERE string_tags['cloud.name'] = 'aws'
WHERE float_attributes['hradline_count'] = 4
WHERE float_tags['container.size'] = 22.4
To use the float
context ensure that the relevant attributes or tags are indeed numeric. To do that, check the relevant log in json format to see if the referenced field is not wrapped with quotes (for example, headline_count
in the screenshot below)

SQL Query structure for a monitor
In order to be able to use an SQL query to create a monitor you must make sure the query returns no more than a single numeric field - this is the monitored field on which the threshold is placed.
The query can also contain any number of "group by" fields that are passed to the monitor as context labels.
Here is an exmaple of an SQL query that can be used for a monitor
with engineStatusLastWeek as (
select string_attributes['tenantID'] tenantID, , string_attributes['env'] env, max(float_attributes['engineStatus.numCylinders']) cylinders
from logs
where timestamp >= now() - interval 7 days
and workload = 'engine-processing'
and string_attributes['tenantID'] != ''
group by tenantID, env
),
engineStatusNow as (
select string_attributes['tenantID'] tenantID, string_attributes['env'] env, min(float_attributes['engineStatus.numCylinders']) cylinders
from logs
where timestamp >= now() - interval 10 minutes
and workload = 'engine-processing'
and string_attributes['tenantID'] != ''
group by tenantID, env
)
select n.tenantID, n.env, n.cylinders/lw.cylinders AS threshold
from engineStatusNow n
left join engineStatusLastWeek lw using (tenantID)
where n.cylinders/lw.cylinders <= 0.5
In this query the threshold field is a ratio between some value measured on the last week and in the last 10 minutes.
tenantID
and env
are the group by labels that are passed to the monitor as context labels.
Here is another query example (check the percentage of errors in a set of logs):
A single numeric value is calculated and grouped by cluster, namespace and workload
SELECT cluster, namespace, workload,
round( 100.0 * countIf(level = 'error') /
nullIf(count(), 0), 2 ) AS error_ratio_pct
FROM "groundcover"."logs"
WHERE timestamp >= now() - interval '10 minute' AND
namespace IN ('refurbished', 'interface') GROUP BY cluster, namespace, workload
Applying the SQL query as a monitor
Applying an SQL query can only happens in YAML mode. You can use the following YAML template to add your query
title: "[SQL] Monitor name"
display:
header: Monitor description
severity: S2
measurementType: event
model:
queries:
- name: threshold_input_query
expression: "[YOUR SQL QUERY GOES HERE]"
datasourceType: clickhouse
queryType: instant
thresholds:
- name: threshold_1
inputName: threshold_input_query
operator: lt
values:
- 0.5
annotations:
[Workflow Name]: enabled
executionErrorState: OK
noDataState: OK
evaluationInterval:
interval: 3m
pendingFor: 2m
isPaused: false
Give your monitor a name and a description
Paste your SQL query in the
expression
fieldSet the threshold value and the relevant operator - in this example this is "lower than" 0.5 (< 0.5)
Set your workflow name in the annotations section
Set the check interval and the pending time
Save the monitor
Last updated