# Metrics Aggregation

### What is Metrics Aggregation?

Metrics aggregation is a real-time data processing feature that automatically transforms and summarizes metrics as they are ingested into the system. Instead of storing every individual metric sample with all its labels, aggregation rules combine multiple metric samples into aggregated values, reducing storage requirements and improving query performance.

groundcover metrics aggregation processes metrics in real-time as they flow through the system. This allows you to:

* **Reduce storage costs**: By aggregating metrics and removing unnecessary labels, you store fewer unique time series.
* **Improve query performance**: Aggregated metrics are pre-computed, making queries faster.
* **Simplify metric names**: Transform complex metric names into cleaner, more consistent formats.
* **Remove granularity**: Drop labels that aren't needed for your use case (e.g., node-level labels when you only care about cluster-level metrics)

To manage the metrics aggregation rules, go to the dedicated page in [settings](https://app.groundcover.com/settings/aggregations).

{% hint style="info" %}
Metrics aggregation rules can only be edited by account Admins
{% endhint %}

### Metrics aggregation rule format

Each aggregation rule specifies:

* **Which metrics to match**: Using label selectors and metric name patterns
* **Which labels to remove**: Using the `without` parameter to drop specific labels
* **How to aggregate**: Using output functions like `total_prometheus`, `avg`, `last`, or `count_series`
* **How often to aggregate**: Using the `interval` parameter to control aggregation frequency

A full reference of the configuration options can be [found here](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/).

groundcover has a set of defaults installed. You can always revert to the defaults provided by groundcover by clicking on `Restore Defaults`.

### Walkthrough: Counter Metrics Aggregation Example

Let's walk through one of the default aggregation rules to understand how it works:

```yaml
- ignore_old_samples: true
  match: '{__name__=~"groundcover_.+_.+_counter|groundcover_unavailable_count|groundcover_network.+", node_name!=""}'
  without: [node_name, node]
  interval: 30s
  outputs: [total_prometheus]
  output_relabel_configs:
  - source_labels: [__name__]
    target_label: __name__
    regex: "(.+):.*"
    replacement: "$1"
```

#### Breaking Down the Rule

**1. Match Criteria**

```yaml
match: '{__name__=~"groundcover_.+_.+_counter|groundcover_unavailable_count|groundcover_network.+", node_name!=""}'
```

This rule matches metrics where:

* The metric name matches one of these patterns:
  * `groundcover_.+_.+_counter` - Any metric ending with `_counter` that has the pattern `groundcover_*_*_counter`
  * `groundcover_unavailable_count` - The specific unavailable count metric
  * `groundcover_network.+` - Any metric starting with `groundcover_network`
* The metric has a `node_name` label (the `node_name!=""` condition)

**Example metrics that would match:**

* `groundcover_http_request_counter{node_name="node-1", namespace="default", service="api"}`
* `groundcover_unavailable_count{node_name="node-2", namespace="production"}`
* `groundcover_network_bytes_sent{node_name="node-3", pod="web-123"}`

**2. Label Removal**

```yaml
without: [node_name, node]
```

This removes the `node_name` and `node` labels from the aggregated metrics. This means:

* **Before aggregation**: You might have separate time series for each node:
  * `groundcover_http_request_counter{node_name="node-1", namespace="default", service="api"} = 100`
  * `groundcover_http_request_counter{node_name="node-2", namespace="default", service="api"} = 150`
  * `groundcover_http_request_counter{node_name="node-3", namespace="default", service="api"} = 75`
* **After aggregation**: These are combined into a single time series without node labels:
  * `groundcover_http_request_counter{namespace="default", service="api"} = 325` (100 + 150 + 75)

This is useful when you don't need node-level granularity and want to see cluster-wide or namespace-wide metrics.

**3. Aggregation Interval**

```yaml
interval: 30s
```

The aggregation runs every 30 seconds. This means:

* Metrics are collected for 30 seconds
* At the end of each 30-second window, the aggregation function is applied
* The aggregated result is stored as a new metric

**4. Output Function**

```yaml
outputs: [total_prometheus]
```

The `total_prometheus` output function:

* **For counter metrics**: Sums all the counter values across the matching time series
* Creates a new aggregated metric with the combined value
* Maintains Prometheus-compatible counter behavior

**Example:** If you have three nodes each reporting:

* Node 1: `groundcover_http_request_counter{node_name="node-1", namespace="api"} = 1000`
* Node 2: `groundcover_http_request_counter{node_name="node-2", namespace="api"} = 2000`
* Node 3: `groundcover_http_request_counter{node_name="node-3", namespace="api"} = 1500`

After aggregation, you get:

* `groundcover_http_request_counter{namespace="api"} = 4500` (1000 + 2000 + 1500)

**5. Metric Name Transformation**

```yaml
output_relabel_configs:
- source_labels: [__name__]
  target_label: __name__
  regex: "(.+):.*"
  replacement: "$1"
```

This relabeling rule cleans up the metric name by:

* Matching metric names that contain a colon (`:`)
* Extracting everything before the colon
* Replacing the metric name with just the prefix

**Example:**

* Original: `groundcover_http_request_counter:total_prometheus`
* After relabeling: `groundcover_http_request_counter`

This ensures the aggregated metric has a clean, consistent name without the aggregation function suffix.
