LogoLogo
Log in|Playground
  • Welcome
    • Introduction
    • FAQ
  • Capabilities
    • Log Management
    • Infrastructure Monitoring
    • Application Performance Monitoring (APM)
      • Application Metrics
      • Traces
      • Supported Technologies
    • Real User Monitoring (RUM)
  • Getting Started
    • Requirements
      • Kubernetes requirements
      • Kernel requirements for eBPF sensor
      • CPU architectures
      • ClickHouse resources
    • Installation & updating
    • Connect Linux hosts
    • Connect RUM
    • 5 quick steps to get you started
  • Use groundcover
    • Monitors
      • Create a new Monitor
      • Issues page
      • Monitor List page
      • Silences page
      • Monitor Catalog page
      • Monitor YAML structure
      • Embedded Grafana Alerts
        • Create a Grafana alert
    • Dashboards
      • Create a dashboard
      • Embedded Grafana Dashboards
        • Create a Grafana dashboard
        • Build alerts & dashboards with Grafana Terraform provider
        • Using groundcover datasources in a Self-hosted Grafana
    • Insights
    • Explore & Monitors query builder
    • Workflows
      • Create a new Workflow
      • Workflow Examples
      • Alert Structure
    • Search & Filter
    • Issues
    • Role-Based Access Control (RBAC)
    • Service Accounts
    • API Keys
    • Log Patterns
    • Drilldown
    • Scraping custom metrics
      • Operator based metrics
      • kube-state-metrics
      • cadvisor metrics
    • Backup & Restore Metrics
    • Metrics & Labels
    • Add custom environment labels
    • Configuring Pipelines
      • Writing Remap Transforms
      • Logs Pipeline Examples
      • Traces Pipeline Examples
      • Logs to Events Pipeline Examples
      • Logs/Traces Sensitive Data Obfuscation
      • Sensitive Data Obfuscation using OTTL
      • Log Filtering using OTTL
    • Querying your groundcover data
      • Query your logs
        • Example queries
        • Logs alerting
      • Query your metrics
      • Querying you data using an API
      • Using KEDA autoscaler with groundcover
  • Log Parsing with OpenTelemetry Pipelines
  • Log and Trace Correlation
  • RUM
  • Customization
    • Customize deployment
      • Agents in host network mode
      • API Key Secret
      • Argo CD
      • On-premise deployment
      • Quay.io registry
      • Configuring sensor deployment coverage
      • Enabling SSL Tracing in Java Applications
    • Customize usage
      • Filtering Kubernetes entities
      • Custom data retention
      • Sensitive data obfuscation
      • Custom storage
      • Custom logs collection
      • Custom labels and annotations
        • Enrich logs and traces with pod labels & annotations
        • Enrich metrics with node labels
      • Disable tracing for specific protocols
      • Tuning resources
      • Controlling the eBPF sampling mechanism
  • Integrations
    • Overview
    • Workflow Integrations
      • Slack Webhook Integration
      • Opsgenie Integration
      • Webhook Integration
        • Incident.io
      • PagerDuty Integration
      • Jira Webhook Integration
    • Data sources
      • OpenTelemetry
        • Traces & Logs
        • Metrics
      • Istio
      • AWS
        • Ingest CloudWatch Metrics
        • Ingest CloudWatch Logs
        • Ingest Logs Stored on S3
        • Integrate CloudWatch Grafana Datasource
      • GCP
        • Ingest Google Cloud Monitoring Metrics
        • Stream Logs using Pub/Sub
        • Integrate Google Cloud Monitoring Grafana Datasource
      • Azure
        • Ingest Azure Monitor Metrics
      • DataDog
        • Traces
        • Metrics
      • FluentBit
      • Fluentd
      • JSON Logs
    • 3rd-party metrics
      • ActiveMQ
      • Aerospike
      • Cassandra
      • CloudFlare
      • Consul
      • CoreDNS
      • Etcd
      • HAProxy
      • Harbor
      • JMeter
      • K6
      • Loki
      • Nginx
      • Pi-hole
      • Postfix
      • RabbitMQ
      • Redpanda
      • SNMP
      • Solr
      • Tomcat
      • Traefik
      • Varnish
      • Vertica
      • Zabbix
    • Source control (Gitlab/Github)
  • Architecture
    • Overview
    • inCloud Managed
      • Setup inCloud Managed with AWS
        • AWS PrivateLink Setup
        • EKS add-on
      • Setup inCloud Managed with GCP
      • Setup inCloud Managed with Azure
      • High Availability
      • Disaster Recovery
      • Ingestion Endpoints
      • Deploying in Sensor-Only mode
    • Security considerations
      • Okta SSO - onboarding
    • Service endpoints inside the cluster
  • Product Updates
    • What's new?
    • Earlier updates
      • 2025
        • Mar 2025
        • Feb 2025
        • Jan 2025
      • 2024
        • Dec 2024
        • Nov 2024
        • Oct 2024
        • Sep 2024
        • Aug 2024
        • July 2024
        • May 2024
        • Apr 2024
        • Mar 2024
        • Feb 2024
        • Jan 2024
      • 2023
        • Dec 2023
        • Nov 2023
        • Oct 2023
Powered by GitBook
On this page
  • Defining the query
  • Defining the alert condition
  • Defining the evaluation behavior
  • Defining labels and notifications
Export as PDF
  1. Use groundcover
  2. Querying your groundcover data
  3. Query your logs

Logs alerting

Last updated 10 months ago

In this end to end example we will set up an alert which triggers if the amount of error logs from any workload has crossed a certain threshold.

Defining the query

We will construct a query that uses the count() operator to get the number of error logs in the defined time window.

SELECT    count()   as log_count,
          workload  as workload,
          namespace as namespace
FROM      groundcover.logs
WHERE     $__timeFilter(timestamp) 
          AND level = 'error'
GROUP     BY workload, namespace

groundcover always saves log levels as lower-cased values, e.g: 'error', 'info'.

The GROUP BY operator will generate the labels that will be attached as part of the alert when it fires.

Running the query returns a list of workloads and the count of error logs. Note the time range at the top of the query, which can be changed accordingly to the needed use case.

Defining the alert condition

Now that we have our data, we need to set an alert condition to determine when our SLO should be considered breached. In our case, we consider any amount of error logs as breaching the SLO. We will use the Threshold expression with 0 as the threshold value, indicating that any workload that has more than 0 error logs should count as a breach.

Note the firing status for all of the returned results - all of these have more than 0 error logs in the last one hour, breaching our SLO condition.

Defining the evaluation behavior

The next step is instructing Grafana on how we want this alert to be evaluated:

  1. Evaluation group: How often we want the rule to be evaluated

  2. Pending period: How long do we allow the SLO to be breached before firing an alert

For example, if we choose an evaluation group of 1m , and a pending period of 3m, we are defining that the alert condition should be checked for breach every 1 minute, but only fire an alert if the breach is ongoing for 3 consecutive minutes.

To give a concrete example, let's look at two different series of evaluations:

1m
2m
3m
4m
5m
Result

BREACHED

BREACHED

OK

BREACHED

BREACHED

OK

OK

BREACHED

BREACHED

BREACHED

BREACHED

FIRING

Even though both examples have the same amount of evaluations that breached the SLO, only the second one is firing an alert. This is because the SLO was breached for more than the allowed pending period of 3 consecutive minutes.

Defining labels and notifications

The next step is to add any extra labels to the fired alert, which can be used when deciding how to handle the firing of the alert. For example, labels such as team and severity could be used to decide on which contact point should be used.

In the notifications part, we can choose to either use the labels assigned to route the alert, or we can select a contact point directly.