Working with Metrics: Datadog vs groundcover
If you're coming from Datadog and starting to work with groundcover, one of the first things you'll notice is that metrics work differently. Not just the query syntax, but the underlying model, how aggregations behave, and what "rate" actually means. This article explains how to think about these differences so you can translate your mental model—not just your queries.
groundcover stores metrics in Prometheus format and uses MetricsQL (a PromQL-compatible query language) for querying. This means the concepts, operators, and query patterns differ from Datadog's approach.
Metric name normalization
When metrics are ingested into groundcover, their names are normalized to be Prometheus-compatible. Prometheus metric names must match the regex [a-zA-Z_:][a-zA-Z0-9_:]*, which means:
Dots (
.) are replaced with underscores (_)Hyphens (
-) are replaced with underscores (_)Names must start with a letter or underscore
system.cpu.user
system_cpu_user
http.server.requests
http_server_requests
my-app.request-count
my_app_request_count
aws.ec2.cpuutilization
aws_ec2_cpuutilization
When querying metrics that originated from Datadog or other sources with dot-notation, remember to use underscores in your Prometheus queries.
Counters vs gauges: same names, different behaviors
Both systems have counters and gauges, but they behave differently in practice.
Counters
A counter is a monotonically increasing value—think "total requests served" or "bytes transmitted." The value only goes up (or resets to zero on restart).
In Datadog, counters are often submitted as already-computed rates or deltas. When you send a counter metric via DogStatsD, the agent may compute the difference between submissions for you. The count type in Datadog represents "events per flush interval."
In Prometheus, counters are raw cumulative values. If your counter reads 1000 at t=0 and 1050 at t=60s, you have the raw numbers. To get a rate, you explicitly apply rate() or increase().
This means that a Datadog query showing sum:my.counter{*} might already be showing a rate, while a Prometheus query showing my_counter is showing the raw cumulative value.
Gauges
Gauges work more similarly between the two systems—they represent instantaneous values like CPU usage or memory consumption. However, aggregation behavior still differs.
Translating common Datadog operators
Here's how to think about the most common Datadog functions and their Prometheus/MetricsQL equivalents.
When writing queries in groundcover dashboards, you can use the built-in variables $range and $interval to dynamically adapt to the selected time range and step size. For example, rate(my_counter[$range]) uses the dashboard's current time range, and rate(my_counter[$interval]) uses the current step interval.
as_rate()
In Datadog: .as_rate() converts a count metric to a per-second rate. It's applied as a modifier to metrics that were submitted as counts.
In Prometheus/MetricsQL: rate() computes per-second rate from raw counter values. It handles counter resets automatically and expects a range vector.
my.counter{*}.as_rate()
rate(my_counter[$interval])
sum:my.requests{*}.as_rate()
sum(rate(my_requests_total[$interval]))
count_nonzero() and count_not_null()
Datadog's count_nonzero() and count_not_null() count the number of non-zero or non-null data points across a time period.
In Prometheus/MetricsQL, use count_over_time() to count how many raw data points (scrapes) occurred for a specific time series within a lookback window.
Syntax: count_over_time(metric_name[duration])
Example: count_over_time(up[1h]) counts how many times the up metric was scraped in the last hour. If your scrape interval is 15 seconds, this returns approximately 240.
count_nonzero(my.metric{*})
count_over_time((my_metric > 0)[1h])
count_not_null(my.metric{*})
count_over_time(my_metric[1h])
Note that Prometheus's count() (without _over_time) is different—it counts the number of time series in a result set, not the number of samples.
sum(), avg(), min(), max()
These aggregation functions are conceptually similar but apply differently.
In Datadog, aggregations often happen across both time and series simultaneously. avg:my.metric{*} averages across all series.
In Prometheus, you separate time aggregation from series aggregation:
avg:my.metric{*}
avg(my_metric)
avg:my.metric{*}.rollup(avg, 60)
avg(avg_over_time(my_metric[1m]))
sum:my.metric{*} by {host}
sum by (host) (my_metric)
rollup()
Datadog's .rollup() controls time aggregation explicitly. In Prometheus, you use *_over_time() functions:
.rollup(avg, 300)
avg_over_time(metric[5m])
.rollup(max, 300)
max_over_time(metric[5m])
.rollup(sum, 300)
sum_over_time(metric[5m])
.rollup(count, 300)
count_over_time(metric[5m])
as_count()
Datadog's .as_count() converts a rate metric back to a count representation. In Prometheus, use increase() to get the total increase over a time window.
For example, if a counter http_requests_total goes from 1000 to 1500 over 5 minutes, increase(http_requests_total[5m]) returns 500—the total number of new requests in that window.
my.counter{*}.as_count()
increase(my_counter[$interval])
top() and bottom()
top(my.metric{*}, 10, 'mean', 'desc')
topk(10, my_metric)
bottom(my.metric{*}, 5, 'mean', 'asc')
bottomk(5, my_metric)
MetricsQL also provides topk_last(), topk_avg(), and similar variants that may better match Datadog's behavior for specific aggregation methods.
fill()
Datadog's fill() handles missing data points. In Prometheus, missing data typically appears as gaps. MetricsQL provides the default binary operator and the keep_last_value() function to handle this:
.fill(zero)
my_metric default 0
.fill(last)
keep_last_value(my_metric) (MetricsQL)
Troubleshooting data discrepancies
When comparing Datadog and groundcover side-by-side during a migration, you'll likely see numbers that don't match perfectly. Here's how to diagnose common issues.
Step size and resolution
The problem: Your Datadog dashboard shows 1000 requests/sec but groundcover shows 950.
Why it happens: Datadog and Prometheus may use different step sizes (the interval between data points). A 10-second step captures more detail than a 60-second step, and aggregations over these intervals produce different results.
How to fix it:
In groundcover dashboards, explicitly control the step using the step parameter in your query or by adjusting the time range.
Compare like-for-like by:
Using the same time range in both systems
Explicitly setting rollup/step to match
Understanding that Datadog auto-adjusts rollup based on time range while Prometheus requires explicit ranges
Missing filters
The problem: Datadog shows 500 errors/min but groundcover shows 2000.
Why it happens: Datadog might have implicit filters from dashboard template variables, saved views, or default scopes that aren't obvious. Tags in Datadog might map to different label names in groundcover.
How to fix it:
Check for template variables in the Datadog dashboard that apply filters
Review the full Datadog query including any
.filter()callsMap Datadog tags to groundcover labels—common mappings include:
host→node_nameorinstanceenv→envservice→workloadkube_namespace→namespace
Environment and cluster scope
The problem: Numbers are dramatically different between the two systems.
Why it happens: You might be looking at different environments or clusters. Datadog scopes might include production + staging, while groundcover might be filtering to a single cluster.
How to fix it:
Always explicitly filter by environment and cluster:
Check the groundcover query builder's environment and cluster filters—they persist across sessions and might be limiting your view.
Autofill and null handling
The problem: Datadog shows a smooth line while groundcover shows gaps.
Why it happens: Datadog's .fill(last) or .fill(zero) interpolates missing data points. Prometheus shows gaps where no data exists.
How to fix it:
In MetricsQL (groundcover's metrics language), use:
Be aware that filling with zeros can distort averages—a gap might mean "no data" not "zero value."
Counter resets
The problem: You see sudden spikes or dips in rate calculations.
Why it happens: When a pod restarts, counters reset to zero. Prometheus's rate() handles this, but if your range selector is too short, you might capture artifacts. Datadog's pre-aggregation might smooth these over differently.
How to fix it:
Use a range selector at least 4x your scrape interval:
Consider using increase() for total counts over a period rather than instantaneous rates.
Metric name differences
The problem: The metric exists in Datadog but doesn't seem to exist in groundcover.
Why it happens: Metrics might have different names. groundcover uses the groundcover_ prefix for its built-in metrics and follows Prometheus naming conventions (snake_case with units as suffixes).
Common mappings:
system.cpu.user
groundcover_container_cpu_usage_rate_millis
kubernetes.cpu.usage.total
groundcover_container_cpu_usage_rate_millis
system.mem.used
groundcover_node_mem_used_bytes
kubernetes.memory.usage
groundcover_container_mem_used_bytes
kube_pod_status_phase
groundcover_kube_pod_status_phase
For a full list of available metrics, see the Metrics & Labels reference.
Thinking in PromQL vs Datadog query language
Beyond syntax translation, there's a mental model shift worth making.
Datadog queries are often imperative: "Take this metric, filter it, aggregate it, roll it up." You describe a sequence of operations.
Prometheus queries are more declarative: "Give me the 5-minute rate of this counter, summed by service." You describe the result you want.
This shows up in how you build complex queries:
Datadog style:
Prometheus style:
Both compute error rate, but the Prometheus version reads more like a mathematical formula. The rate calculation and aggregation happen together, not as sequential transformations.
What to do when migration tools don't get it right
groundcover's migration tools handle most query translations automatically. But complex queries or custom metrics might need manual adjustment.
When a migrated monitor or dashboard doesn't match expected values:
Start simple: Query the raw metric without aggregations in both systems
Check cardinality: Ensure the same number of series exist in both systems
Match time ranges: Use identical absolute time ranges, not relative ones
Add aggregations incrementally: Build up the query step-by-step, comparing at each stage
Account for timing: Datadog and groundcover might not have collected data at exactly the same moments—allow for small variations
Further reading
Migrate from Datadog — the procedural migration guide
Monitor YAML structure — writing MetricsQL queries in monitors
MetricsQL documentation — VictoriaMetrics' extended PromQL
Metrics & Labels reference — all available groundcover metrics
Last updated
