Create a new Monitor

Learn how to create and configure monitors using the Wizard, Monitor Catalog, or Import options. The following guide will help you set up queries, thresholds, and alert routing for effective monitoring.

You can either create monitors using our web application following this guide, by using our API, see: Monitors #POST /api/monitors or by using our Terraform provider, see: groundcover Terraform Provider.

In the Monitors section (left navigation bar), navigate to the Issues page or the Monitor List page to create a new Monitor. Click on the “Create Monitor” button and select one of the following options from the dropdown:

Using the Monitor Wizard

Overview

The Monitor Wizard is a guided, user-friendly approach to creating and configuring monitors tailored to your observability needs. By breaking down the process into simple steps, it ensures consistency and accuracy.

Section 1: Information

Set up the basic information for the monitor.

Monitor Title (Required):

Add a title for the monitor. The title will appear in notifications and in the Monitor List page.

Give the Monitor a clear, short name, that describes its function at a high level.

Examples:

  • “Workload High API Error Rate”

  • “Workload Pods High Memory”

The title will appear in the monitors page table and be accessible in workflows and alerts.

Description (Optional):

Add description for your monitor, The description will appear when viewing monitor details, you can also use this for your alerts.

Section 2: Query

Select the data source, build the query and define thresholds for the monitor.

If you're unfamiliar with query building in groundcover, refer to the Query Builder section for full details on the different components.

  • Data Source (Required):

    • Select the type of data (Metrics, Infra Metrics, Logs, or Traces).

  • Query Functionality:

    • Choose how to process the data (e.g., average, count).

    • Add aggregation clauses if applicable, you MUST use aggregations if you want to add labels to your issues.

    • Examples: cluster, workload, container_name

  • Time Window (Required):

    • Specify the period over which data is aggregated.

    • Example: “Over the last 5 minutes.”

  • Threshold Conditions (Required):

    • Define when the monitor triggers. You can use:

      • Greater Than - Trigger when the value exceeds X.

      • Lower Than - Trigger when the value falls below X.

      • Within Range - Trigger when the value is between X and Y.

      • Outside Range - Trigger when the value is not between X and Y.

    • Example: “Trigger if disk space usage is greater than 10%.”

  • Visualization Type (Optional):

    • Preview data using Stacked Bar or Line Chart for better clarity. This is just for helping visualize while building the monitor.

Section 3: Display

Customize how the Monitor’s Issues will appear. This section also includes a live preview of the way it will appear in the Issues page.

Ensure that the labels you wish to use dynamically (e.g., span_name, workload) are defined in the query configuration step (Section 2: Query)

Issue Header (required):

Define a name for issues that this Monitor will raise. It's useful to use labels that can include information from the query.

For example, adding {{ alert.labels.statusCode }} to the header will inject the status code to the name of the issue - this becomes especially useful when one Monitor raises multiple issues and you want to quickly understand their content without having to open each one.

Examples:

  • “HTTP API Error {{ alert.labels.status_code }}” -> HTTP API Error 500

  • “Workload {{ alert.labels.workload }} Pod Restart” -> Workload frontend Pod Restart

  • “{{ alert.labels.customer }} APIs High Latency” -> org.com APIs High Latenct

If you do choose to use templated dynamic values, make sure they exist as monitor query labels.

Severity (required):

Use severity to categorize alerts by importance.

Select a severity level (S1-S4).

Context Labels (optional):

If you want to use labels here you MUST add them to query aggregation.

These Labels will be displayed and filterable in Monitors>Issues page.

We recommend using up to 5 Labels for best experience.

Section 4: Metadata Labels

Organize and categorize monitors, you can use these to route issues using advanced workflows.

  • Labels (optional):

    • Add key-value pairs for metadata.

Section 5: Evaluation Settings

Define how often the monitor evaluates its conditions.

Evaluation Interval (Required):

Specify how often the monitor evalutes the query

Example: “Evaluate every 1 minute.”

Pending Period (Required):

This ensures that transient conditions do not trigger alerts, reducing false positives. For example, setting this to 10 minutes ensures the condition must persist for at least 10 minutes before firing.

If the query conditions were not met in an evaluation durtion the pending period the issue status will go back to normal.

Example: “Wait for 10 minute before alerting."

Section 6: Routing

Set up how issues from this monitor will be routed.

Select Workflow (Optional):

Route alerts to existing workflows only, this means that other workflows will not process them. Use this to send alerts for a critical application to Slack or PagerDuty.

No Routing (Optional):

This means that any workflow (without filters), will process the issue.

Quick tips to create effective Monitors

Use the Monitor Catalog as much as you can

Whenever possible, use our carefully crafted monitors from the Monitor Catalog. This will save you time, ensure the Monitors are built effectively, and help you align your alerting strategy with best practices. If you can't find one that perfectly matches your needs, use them as your starting point and edit their properties to customize them to your needs.

Give the Monitor a short and clear title

Give the Monitor a clear, short name, that describes its function at a high level.

Examples:

  • “Workload High API Error Rate”

  • “Workload Pods High Memory”

The title will appear in the monitors page table and be accessible in workflows and alerts.

Use a Descriptive Issue Header

Choose a clear name for the Issue header, offering a bit more details and going into a more specific description of the monitor name. A Header is a specific property of an issue, so you can add templated dynamic values here. For example, you can use dynamic label values in the header name.

Examples:

  • “HTTP API Error {{ alert.labels.status_code }}”,

  • “Workload {{ alert.labels.workload }} Pod Restart”

  • “{{ alert.labels.customer }} APIs High Latency”.

If you do choose to use templated dynamic values, make sure they exist as monitor query labels.

Use up to 3 Resource Labels

We recommend using up to 3 ResourceHeaderLabels. The labels here should give your team the context of what is the subject of the issue.

Examples:

span_name , pod_name

ResourceHeaderLabels appear as a secondary header in Issues tables across the platform.

Use up to 3 Context Labels

We recommend using up to 3 ContextHeaderLabels. The labels here should give you team the context of where the issue happened.

Examples:

cluster, namespace , workload

ContextHeaderLabels appear on Issues tables across platform, next to your issues.

Using the Import option

In the "Import Bulk Monitors" you can add multiple monitors using an array of Monitors that follows the Monitor YAML structure.

Example of importing multiple monitors

monitors:
- title: K8s Cluster High Memory Requests Monitor
  display:
    header: K8s Cluster High Memory Requests
    description: Alerts when a K8s Cluster's total Container Memory Requests exceeds 90% of the Allocatable Memory of all the Nodes for 5 minutes    
    contextHeaderLabels:
      - env
      - cluster
  severity: S1
  measurementType: state
  model:
    queries:
      - name: threshold_input_query
        expression: avg_over_time( (((sum(groundcover_node_rt_mem_requests_bytes{}) by (cluster, env)) / (sum(groundcover_node_rt_allocatable_mem_bytes{}) by (cluster, env))) * 100)[5m] )
        queryType: instant
        datasourceType: prometheus
    thresholds:
      - name: threshold_1
        inputName: threshold_input_query
        operator: gt
        values:
          - 90
  noDataState: OK
  evaluationInterval:
    interval: 1m
    pendingFor: 0s
- title: K8s PVC Pending For 5 Minutes Monitor
  display:
    header: K8s PVC Pending Over 5 Minutes
    description: This monitor triggers an alert when a PVC remains in a Pending state for more than 5 minutes.
    contextHeaderLabels:
      - cluster
      - namespace
      - persistentvolumeclaim
  severity: S2
  measurementType: state
  model:
    queries:
      - name: threshold_input_query
        expression: last_over_time(max(groundcover_kube_persistentvolumeclaim_status_phase{phase="Pending"}) by (cluster, namespace, persistentvolumeclaim)[1m])
        queryType: instant
        datasourceType: prometheus
    thresholds:
      - name: threshold_1
        inputName: threshold_input_query
        operator: gt
        values:
          - 0
  executionErrorState: OK
  noDataState: OK
  evaluationInterval:
    interval: 1m
    pendingFor: 5m

Click on "Create Monitors" to create them.

Last updated