Metrics Aggregation

What is Metrics Aggregation?

Metrics aggregation is a real-time data processing feature that automatically transforms and summarizes metrics as they are ingested into the system. Instead of storing every individual metric sample with all its labels, aggregation rules combine multiple metric samples into aggregated values, reducing storage requirements and improving query performance.

groundcover metrics aggregation processes metrics in real-time as they flow through the system. This allows you to:

  • Reduce storage costs: By aggregating metrics and removing unnecessary labels, you store fewer unique time series.

  • Improve query performance: Aggregated metrics are pre-computed, making queries faster.

  • Simplify metric names: Transform complex metric names into cleaner, more consistent formats.

  • Remove granularity: Drop labels that aren't needed for your use case (e.g., node-level labels when you only care about cluster-level metrics)

To manage the metrics aggregation rules, go to the dedicated page in settingsarrow-up-right.

circle-info

Metrics aggregation rules can only be edited by account Admins

Metrics aggregation rule format

Each aggregation rule specifies:

  • Which metrics to match: Using label selectors and metric name patterns

  • Which labels to remove: Using the without parameter to drop specific labels

  • How to aggregate: Using output functions like total_prometheus, avg, last, or count_series

  • How often to aggregate: Using the interval parameter to control aggregation frequency

A full reference of the configuration options can be found herearrow-up-right.

groundcover has a set of defaults installed. You can always revert to the defaults provided by groundcover by clicking on Restore Defaults.

Walkthrough: Counter Metrics Aggregation Example

Let's walk through one of the default aggregation rules to understand how it works:

Breaking Down the Rule

1. Match Criteria

This rule matches metrics where:

  • The metric name matches one of these patterns:

    • groundcover_.+_.+_counter - Any metric ending with _counter that has the pattern groundcover_*_*_counter

    • groundcover_unavailable_count - The specific unavailable count metric

    • groundcover_network.+ - Any metric starting with groundcover_network

  • The metric has a node_name label (the node_name!="" condition)

Example metrics that would match:

  • groundcover_http_request_counter{node_name="node-1", namespace="default", service="api"}

  • groundcover_unavailable_count{node_name="node-2", namespace="production"}

  • groundcover_network_bytes_sent{node_name="node-3", pod="web-123"}

2. Label Removal

This removes the node_name and node labels from the aggregated metrics. This means:

  • Before aggregation: You might have separate time series for each node:

    • groundcover_http_request_counter{node_name="node-1", namespace="default", service="api"} = 100

    • groundcover_http_request_counter{node_name="node-2", namespace="default", service="api"} = 150

    • groundcover_http_request_counter{node_name="node-3", namespace="default", service="api"} = 75

  • After aggregation: These are combined into a single time series without node labels:

    • groundcover_http_request_counter{namespace="default", service="api"} = 325 (100 + 150 + 75)

This is useful when you don't need node-level granularity and want to see cluster-wide or namespace-wide metrics.

3. Aggregation Interval

The aggregation runs every 30 seconds. This means:

  • Metrics are collected for 30 seconds

  • At the end of each 30-second window, the aggregation function is applied

  • The aggregated result is stored as a new metric

4. Output Function

The total_prometheus output function:

  • For counter metrics: Sums all the counter values across the matching time series

  • Creates a new aggregated metric with the combined value

  • Maintains Prometheus-compatible counter behavior

Example: If you have three nodes each reporting:

  • Node 1: groundcover_http_request_counter{node_name="node-1", namespace="api"} = 1000

  • Node 2: groundcover_http_request_counter{node_name="node-2", namespace="api"} = 2000

  • Node 3: groundcover_http_request_counter{node_name="node-3", namespace="api"} = 1500

After aggregation, you get:

  • groundcover_http_request_counter{namespace="api"} = 4500 (1000 + 2000 + 1500)

5. Metric Name Transformation

This relabeling rule cleans up the metric name by:

  • Matching metric names that contain a colon (:)

  • Extracting everything before the colon

  • Replacing the metric name with just the prefix

Example:

  • Original: groundcover_http_request_counter:total_prometheus

  • After relabeling: groundcover_http_request_counter

This ensures the aggregated metric has a clean, consistent name without the aggregation function suffix.

Last updated