# Working with Metrics: Datadog vs groundcover

If you're coming from Datadog and starting to work with groundcover, one of the first things you'll notice is that metrics work differently. Not just the query syntax, but the underlying model, how aggregations behave, and what "rate" actually means. This article explains how to think about these differences so you can translate your mental model—not just your queries.

groundcover stores metrics in Prometheus format and uses [MetricsQL](https://docs.victoriametrics.com/metricsql/) (a PromQL-compatible query language) for querying. This means the concepts, operators, and query patterns differ from Datadog's approach.

## Metric name normalization

When metrics are ingested into groundcover, their names are normalized to be Prometheus-compatible. Prometheus metric names must match the regex `[a-zA-Z_:][a-zA-Z0-9_:]*`, which means:

* Dots (`.`) are replaced with underscores (`_`)
* Hyphens (`-`) are replaced with underscores (`_`)
* Names must start with a letter or underscore

| Datadog metric name      | Prometheus metric name   |
| ------------------------ | ------------------------ |
| `system.cpu.user`        | `system_cpu_user`        |
| `http.server.requests`   | `http_server_requests`   |
| `my-app.request-count`   | `my_app_request_count`   |
| `aws.ec2.cpuutilization` | `aws_ec2_cpuutilization` |

When querying metrics that originated from Datadog or other sources with dot-notation, remember to use underscores in your Prometheus queries.

## Counters vs gauges: same names, different behaviors

Both systems have counters and gauges, but they behave differently in practice.

### Counters

A counter is a monotonically increasing value—think "total requests served" or "bytes transmitted." The value only goes up (or resets to zero on restart).

In **Datadog**, counters are often submitted as already-computed rates or deltas. When you send a counter metric via DogStatsD, the agent may compute the difference between submissions for you. The `count` type in Datadog represents "events per flush interval."

In **Prometheus**, counters are raw cumulative values. If your counter reads 1000 at t=0 and 1050 at t=60s, you have the raw numbers. To get a rate, you explicitly apply `rate()` or `increase()`.

This means that a Datadog query showing `sum:my.counter{*}` might already be showing a rate, while a Prometheus query showing `my_counter` is showing the raw cumulative value.

### Gauges

Gauges work more similarly between the two systems—they represent instantaneous values like CPU usage or memory consumption. However, aggregation behavior still differs.

## Translating common Datadog operators

Here's how to think about the most common Datadog functions and their Prometheus/MetricsQL equivalents.

When writing queries in groundcover dashboards, you can use the built-in variables `$range` and `$interval` to dynamically adapt to the selected time range and step size. For example, `rate(my_counter[$range])` uses the dashboard's current time range, and `rate(my_counter[$interval])` uses the current step interval.

### as\_rate()

**In Datadog:** `.as_rate()` converts a count metric to a per-second rate. It's applied as a modifier to metrics that were submitted as counts.

**In Prometheus/MetricsQL:** `rate()` computes per-second rate from raw counter values. It handles counter resets automatically and expects a range vector.

| Datadog                        | Prometheus/MetricsQL                      |
| ------------------------------ | ----------------------------------------- |
| `my.counter{*}.as_rate()`      | `rate(my_counter[$interval])`             |
| `sum:my.requests{*}.as_rate()` | `sum(rate(my_requests_total[$interval]))` |

### count\_nonzero() and count\_not\_null()

Datadog's `count_nonzero()` and `count_not_null()` count the number of non-zero or non-null data points across a time period.

In Prometheus/MetricsQL, use `count_over_time()` to count how many raw data points (scrapes) occurred for a specific time series within a lookback window.

**Syntax:** `count_over_time(metric_name[duration])`

**Example:** `count_over_time(up[1h])` counts how many times the `up` metric was scraped in the last hour. If your scrape interval is 15 seconds, this returns approximately 240.

| Datadog                        | Prometheus/MetricsQL                   |
| ------------------------------ | -------------------------------------- |
| `count_nonzero(my.metric{*})`  | `count_over_time((my_metric > 0)[1h])` |
| `count_not_null(my.metric{*})` | `count_over_time(my_metric[1h])`       |

Note that Prometheus's `count()` (without `_over_time`) is different—it counts the number of time series in a result set, not the number of samples.

### sum(), avg(), min(), max()

These aggregation functions are conceptually similar but apply differently.

**In Datadog**, aggregations often happen across both time and series simultaneously. `avg:my.metric{*}` averages across all series.

**In Prometheus**, you separate time aggregation from series aggregation:

| Datadog                            | Prometheus/MetricsQL                |
| ---------------------------------- | ----------------------------------- |
| `avg:my.metric{*}`                 | `avg(my_metric)`                    |
| `avg:my.metric{*}.rollup(avg, 60)` | `avg(avg_over_time(my_metric[1m]))` |
| `sum:my.metric{*} by {host}`       | `sum by (host) (my_metric)`         |

### rollup()

Datadog's `.rollup()` controls time aggregation explicitly. In Prometheus, you use `*_over_time()` functions:

| Datadog               | Prometheus/MetricsQL          |
| --------------------- | ----------------------------- |
| `.rollup(avg, 300)`   | `avg_over_time(metric[5m])`   |
| `.rollup(max, 300)`   | `max_over_time(metric[5m])`   |
| `.rollup(sum, 300)`   | `sum_over_time(metric[5m])`   |
| `.rollup(count, 300)` | `count_over_time(metric[5m])` |

### as\_count()

Datadog's `.as_count()` converts a rate metric back to a count representation. In Prometheus, use `increase()` to get the total increase over a time window.

For example, if a counter `http_requests_total` goes from 1000 to 1500 over 5 minutes, `increase(http_requests_total[5m])` returns 500—the total number of new requests in that window.

| Datadog                    | Prometheus/MetricsQL              |
| -------------------------- | --------------------------------- |
| `my.counter{*}.as_count()` | `increase(my_counter[$interval])` |

### top() and bottom()

| Datadog                                  | Prometheus/MetricsQL    |
| ---------------------------------------- | ----------------------- |
| `top(my.metric{*}, 10, 'mean', 'desc')`  | `topk(10, my_metric)`   |
| `bottom(my.metric{*}, 5, 'mean', 'asc')` | `bottomk(5, my_metric)` |

MetricsQL also provides `topk_last()`, `topk_avg()`, and similar variants that may better match Datadog's behavior for specific aggregation methods.

### fill()

Datadog's `fill()` handles missing data points. In Prometheus, missing data typically appears as gaps. MetricsQL provides the `default` binary operator and the `keep_last_value()` function to handle this:

| Datadog       | Prometheus/MetricsQL                     |
| ------------- | ---------------------------------------- |
| `.fill(zero)` | `my_metric default 0`                    |
| `.fill(last)` | `keep_last_value(my_metric)` (MetricsQL) |

## Troubleshooting data discrepancies

When comparing Datadog and groundcover side-by-side during a migration, you'll likely see numbers that don't match perfectly. Here's how to diagnose common issues.

### Step size and resolution

**The problem:** Your Datadog dashboard shows 1000 requests/sec but groundcover shows 950.

**Why it happens:** Datadog and Prometheus may use different step sizes (the interval between data points). A 10-second step captures more detail than a 60-second step, and aggregations over these intervals produce different results.

**How to fix it:**

In groundcover dashboards, explicitly control the step using the `step` parameter in your query or by adjusting the time range.

Compare like-for-like by:

1. Using the same time range in both systems
2. Explicitly setting rollup/step to match
3. Understanding that Datadog auto-adjusts rollup based on time range while Prometheus requires explicit ranges

### Missing filters

**The problem:** Datadog shows 500 errors/min but groundcover shows 2000.

**Why it happens:** Datadog might have implicit filters from dashboard template variables, saved views, or default scopes that aren't obvious. Tags in Datadog might map to different label names in groundcover.

**How to fix it:**

1. Check for template variables in the Datadog dashboard that apply filters
2. Review the full Datadog query including any `.filter()` calls
3. Map Datadog tags to groundcover labels—common mappings include:
   * `host` → `node_name` or `instance`
   * `env` → `env`
   * `service` → `workload`
   * `kube_namespace` → `namespace`

### Environment and cluster scope

**The problem:** Numbers are dramatically different between the two systems.

**Why it happens:** You might be looking at different environments or clusters. Datadog scopes might include production + staging, while groundcover might be filtering to a single cluster.

**How to fix it:**

Always explicitly filter by environment and cluster:

```promql
# Explicitly scope to production
sum(rate(http_requests_total{env="production", cluster="us-east-1"}[5m]))
```

Check the groundcover query builder's environment and cluster filters—they persist across sessions and might be limiting your view.

### Autofill and null handling

**The problem:** Datadog shows a smooth line while groundcover shows gaps.

**Why it happens:** Datadog's `.fill(last)` or `.fill(zero)` interpolates missing data points. Prometheus shows gaps where no data exists.

**How to fix it:**

In MetricsQL (groundcover's metrics language), use:

```promql
# Fill gaps with the last known value
keep_last_value(my_metric)

# Fill gaps with zero
my_metric default 0
```

Be aware that filling with zeros can distort averages—a gap might mean "no data" not "zero value."

### Counter resets

**The problem:** You see sudden spikes or dips in rate calculations.

**Why it happens:** When a pod restarts, counters reset to zero. Prometheus's `rate()` handles this, but if your range selector is too short, you might capture artifacts. Datadog's pre-aggregation might smooth these over differently.

**How to fix it:**

Use a range selector at least 4x your scrape interval:

```promql
# If scraping every 30s, use at least a 2m range
rate(http_requests_total[2m])
```

Consider using `increase()` for total counts over a period rather than instantaneous rates.

### Metric name differences

**The problem:** The metric exists in Datadog but doesn't seem to exist in groundcover.

**Why it happens:** Metrics might have different names. groundcover uses the `groundcover_` prefix for its built-in metrics and follows Prometheus naming conventions (snake\_case with units as suffixes).

**Common mappings:**

| Datadog                      | groundcover                                   |
| ---------------------------- | --------------------------------------------- |
| `system.cpu.user`            | `groundcover_container_cpu_usage_rate_millis` |
| `kubernetes.cpu.usage.total` | `groundcover_container_cpu_usage_rate_millis` |
| `system.mem.used`            | `groundcover_node_mem_used_bytes`             |
| `kubernetes.memory.usage`    | `groundcover_container_mem_used_bytes`        |
| `kube_pod_status_phase`      | `groundcover_kube_pod_status_phase`           |

For a full list of available metrics, see the [Metrics & Labels](/use-groundcover/metrics-and-labels.md) reference.

## Thinking in PromQL vs Datadog query language

Beyond syntax translation, there's a mental model shift worth making.

**Datadog queries are often imperative:** "Take this metric, filter it, aggregate it, roll it up." You describe a sequence of operations.

**Prometheus queries are more declarative:** "Give me the 5-minute rate of this counter, summed by service." You describe the result you want.

This shows up in how you build complex queries:

**Datadog style:**

```
avg:http.requests{env:prod}.rollup(sum, 60).as_rate() / avg:http.requests{env:prod,status:error}.rollup(sum, 60).as_rate() * 100
```

**Prometheus style:**

```promql
sum(rate(http_requests_total{env="prod", status=~"5.."}[5m]))
/ 
sum(rate(http_requests_total{env="prod"}[5m])) 
* 100
```

Both compute error rate, but the Prometheus version reads more like a mathematical formula. The rate calculation and aggregation happen together, not as sequential transformations.

## What to do when migration tools don't get it right

groundcover's [migration tools](/getting-started/migrations/migrate-from-datadog.md) handle most query translations automatically. But complex queries or custom metrics might need manual adjustment.

When a migrated monitor or dashboard doesn't match expected values:

1. **Start simple:** Query the raw metric without aggregations in both systems
2. **Check cardinality:** Ensure the same number of series exist in both systems
3. **Match time ranges:** Use identical absolute time ranges, not relative ones
4. **Add aggregations incrementally:** Build up the query step-by-step, comparing at each stage
5. **Account for timing:** Datadog and groundcover might not have collected data at exactly the same moments—allow for small variations

## Further reading

* [Migrate from Datadog](/getting-started/migrations/migrate-from-datadog.md) — the procedural migration guide
* [Monitor YAML structure](/use-groundcover/monitors/monitor-yaml-structure.md) — writing MetricsQL queries in monitors
* [MetricsQL documentation](https://docs.victoriametrics.com/metricsql/) — VictoriaMetrics' extended PromQL
* [Metrics & Labels reference](/use-groundcover/metrics-and-labels.md) — all available groundcover metrics


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.groundcover.com/articles/datadog-metrics-to-prometheus.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
