Span to Metrics
Spans to Metrics
Overview
Transform your trace data into queryable metrics for long-term monitoring, alerting, and cost-effective analysis. Spans-to-metrics extracts numerical data from span attributes and converts them into time-series metrics that can be visualized, alerted on, and retained at a fraction of the cost.
Sampling matters. For eBPF-captured traces, the pipeline runs after sampling. metrics generated from eBPF spans reflect only sampled data, not total traffic. For ingested traces, the pipeline runs before sampling - metrics reflect the full, unsampled dataset. Keep this distinction in mind when interpreting counts and sums.
Why Use Spans to Metrics?
Traces are perfect for debugging specific requests, but they become expensive and unwieldy at scale. Metrics, on the other hand, are:
Cost-effective - Store aggregated data instead of every span
Fast to query - Optimized for time-series analysis
Perfect for alerting - Track trends and thresholds over time
The Transformation
Think of it like converting request traces into spreadsheet rows:
A Span (request):
POST /api/orders — duration: 120ms, status: Ok, workload: order-serviceBecomes Metrics (structured data):
[now]
order_requests_total
1
workload:order-service, endpoint:/api/orders
[now]
order_request_duration_ms
120
workload:order-service, endpoint:/api/orders
You're essentially turning trace data into countable, measurable data points.
When to Use Spans to Metrics
Spans to metrics doesn't replace traces, it complements them. Use it for these scenarios:
1. Monitoring Latency Distribution
Track response time trends to understand system performance.
Use cases:
Average, minimum, and maximum response times per endpoint
Service latency degradation detection
Comparing latency across workloads or protocols
Example:
Create a metric request_duration_ns with sum, min, max, and count, then calculate average latency:
2. Tracking Error Ratios
Derive error rates by comparing error span counts to total span counts.
Use cases:
Error ratio per service or endpoint
Detecting degradation trends over time
Protocol-level error comparison (HTTP vs gRPC)
Example:
Create separate count metrics for all spans and error spans, then compute the ratio:
3. Enrichment-Based Metrics
Generate metrics from data you've extracted or enriched in earlier pipeline rules — such as parsed response fields, custom attributes, or header values.
Use cases:
Metrics derived from JSON response body fields (e.g.
cache["order_total"])Counting spans by custom attributes set in transform rules
Aggregating values extracted from request/response headers
Example:
After a transform rule parses order_total from the response body into cache, create a sum metric:
4. Counting Ingested (OTel) Request Volume
For ingested traces only (not eBPF), you can count absolute request volume because the pipeline runs before sampling.
Use cases:
Total API request counts per endpoint
Service-to-service call frequency
Request volume per protocol type
Absolute counts from eBPF spans reflect sampled data only. Use them for relative comparisons and trend analysis, not for exact volume measurement.
How Spans-to-Metrics Works
Spans-to-metrics uses the special s2m map to define what metrics to create and how to aggregate them.
Available Operations
groundcover supports four metric aggregation operations:
span_to_metric_count
Count spans matching criteria
Request counts, event occurrences
span_to_metric_sum
Sum extracted values
Total duration, total payload size
span_to_metric_max
Maximum value observed
Peak response time, largest payload
span_to_metric_min
Minimum value observed
Fastest response time, smallest payload
groundcover automatically adds a _gc_op suffix with the operation type to generated metrics (e.g., _sum, _min, _max, _count).
Basic Structure
Best Practices
Choose meaningful metric names - Use descriptive names that indicate what's being measured
Good:
http_requests_total,order_processing_durationBad:
metric1,counter
Use appropriate labels - Add dimensions that help you slice and dice the data
Common labels:
workload,endpoint,protocol_type,status,namespaceAvoid high-cardinality labels (unique trace IDs, user IDs, span IDs)
Use conditions to scope rules - Only generate metrics from relevant spans to minimize overhead
Combine operations - Use count, sum, min, and max together for comprehensive insights
Count requests + sum duration = average latency
Min/max provide performance bounds
Use type conversion - Always convert values to
Double()for sum/min/max operationsDouble(attributes["duration"])notattributes["duration"]
Prefer ratios over absolute counts for eBPF - Since eBPF spans are sampled, ratios (e.g. error rate) are more reliable than raw counts
Viewing Your Metrics
After creating spans-to-metrics rules:
Metrics appear in Metrics Explorer within minutes
Use PromQL to query your custom metrics
Create dashboards to visualize trends
Set up monitors for alerting on thresholds
Common Use Cases
Tracking Request Duration
Monitor response times with min, max, and sum. Duration metrics are accurate even on sampled eBPF data.
Output metrics:
💡 What it does: Tracks latency distribution per endpoint. Calculate average latency with rate(sum) / rate(count).
Tracking Error Ratios
Compare error spans to total spans for reliable error rate monitoring.
💡 What it does: Creates two count metrics. Calculate error rate with rate(http_spans_errors) / rate(http_spans_total). This ratio is reliable even on sampled eBPF data.
Monitoring Payload Size
Track request and response body sizes from span attributes.
💡 What it does: Tracks payload size distribution. Min/max values are accurate regardless of sampling.
Key Functions
s2m Map
The s2m map stores the labels (dimensions) for your metrics.
span_to_metric_count
Counts the number of spans matching the rule.
Syntax:
Use for: Request counts, event occurrences, error counts
span_to_metric_sum
Sums numerical values from spans.
Syntax:
Use for: Total duration, total bytes, cumulative values
span_to_metric_max
Tracks the maximum value observed.
Syntax:
Use for: Peak response times, largest payloads
span_to_metric_min
Tracks the minimum value observed.
Syntax:
Use for: Fastest response times, smallest payloads
Last updated
