Span to Metrics

Spans to Metrics

Overview

Transform your trace data into queryable metrics for long-term monitoring, alerting, and cost-effective analysis. Spans-to-metrics extracts numerical data from span attributes and converts them into time-series metrics that can be visualized, alerted on, and retained at a fraction of the cost.

circle-exclamation

Why Use Spans to Metrics?

Traces are perfect for debugging specific requests, but they become expensive and unwieldy at scale. Metrics, on the other hand, are:

  • Cost-effective - Store aggregated data instead of every span

  • Fast to query - Optimized for time-series analysis

  • Perfect for alerting - Track trends and thresholds over time

The Transformation

Think of it like converting request traces into spreadsheet rows:

A Span (request):

POST /api/orders — duration: 120ms, status: Ok, workload: order-service

Becomes Metrics (structured data):

Timestamp
Metric Name
Value
Labels

[now]

order_requests_total

1

workload:order-service, endpoint:/api/orders

[now]

order_request_duration_ms

120

workload:order-service, endpoint:/api/orders

You're essentially turning trace data into countable, measurable data points.

When to Use Spans to Metrics

Spans to metrics doesn't replace traces, it complements them. Use it for these scenarios:

1. Monitoring Latency Distribution

Track response time trends to understand system performance.

Use cases:

  • Average, minimum, and maximum response times per endpoint

  • Service latency degradation detection

  • Comparing latency across workloads or protocols

Example:

Create a metric request_duration_ns with sum, min, max, and count, then calculate average latency:

2. Tracking Error Ratios

Derive error rates by comparing error span counts to total span counts.

Use cases:

  • Error ratio per service or endpoint

  • Detecting degradation trends over time

  • Protocol-level error comparison (HTTP vs gRPC)

Example:

Create separate count metrics for all spans and error spans, then compute the ratio:

3. Enrichment-Based Metrics

Generate metrics from data you've extracted or enriched in earlier pipeline rules — such as parsed response fields, custom attributes, or header values.

Use cases:

  • Metrics derived from JSON response body fields (e.g. cache["order_total"])

  • Counting spans by custom attributes set in transform rules

  • Aggregating values extracted from request/response headers

Example:

After a transform rule parses order_total from the response body into cache, create a sum metric:

4. Counting Ingested (OTel) Request Volume

For ingested traces only (not eBPF), you can count absolute request volume because the pipeline runs before sampling.

Use cases:

  • Total API request counts per endpoint

  • Service-to-service call frequency

  • Request volume per protocol type

circle-exclamation

How Spans-to-Metrics Works

Spans-to-metrics uses the special s2m map to define what metrics to create and how to aggregate them.

Available Operations

groundcover supports four metric aggregation operations:

Function
Description
Use Case

span_to_metric_count

Count spans matching criteria

Request counts, event occurrences

span_to_metric_sum

Sum extracted values

Total duration, total payload size

span_to_metric_max

Maximum value observed

Peak response time, largest payload

span_to_metric_min

Minimum value observed

Fastest response time, smallest payload

circle-info

groundcover automatically adds a _gc_op suffix with the operation type to generated metrics (e.g., _sum, _min, _max, _count).

Basic Structure

Best Practices

  1. Choose meaningful metric names - Use descriptive names that indicate what's being measured

    • Good: http_requests_total, order_processing_duration

    • Bad: metric1, counter

  2. Use appropriate labels - Add dimensions that help you slice and dice the data

    • Common labels: workload, endpoint, protocol_type, status, namespace

    • Avoid high-cardinality labels (unique trace IDs, user IDs, span IDs)

  3. Use conditions to scope rules - Only generate metrics from relevant spans to minimize overhead

  4. Combine operations - Use count, sum, min, and max together for comprehensive insights

    • Count requests + sum duration = average latency

    • Min/max provide performance bounds

  5. Use type conversion - Always convert values to Double() for sum/min/max operations

    • Double(attributes["duration"]) not attributes["duration"]

  6. Prefer ratios over absolute counts for eBPF - Since eBPF spans are sampled, ratios (e.g. error rate) are more reliable than raw counts

Viewing Your Metrics

After creating spans-to-metrics rules:

  1. Metrics appear in Metrics Explorerarrow-up-right within minutes

  2. Use PromQL to query your custom metrics

  3. Create dashboardsarrow-up-right to visualize trends

  4. Set up monitorsarrow-up-right for alerting on thresholds

Common Use Cases

Tracking Request Duration

Monitor response times with min, max, and sum. Duration metrics are accurate even on sampled eBPF data.

Output metrics:

💡 What it does: Tracks latency distribution per endpoint. Calculate average latency with rate(sum) / rate(count).

Tracking Error Ratios

Compare error spans to total spans for reliable error rate monitoring.

💡 What it does: Creates two count metrics. Calculate error rate with rate(http_spans_errors) / rate(http_spans_total). This ratio is reliable even on sampled eBPF data.

Monitoring Payload Size

Track request and response body sizes from span attributes.

💡 What it does: Tracks payload size distribution. Min/max values are accurate regardless of sampling.

Key Functions

s2m Map

The s2m map stores the labels (dimensions) for your metrics.

span_to_metric_count

Counts the number of spans matching the rule.

Syntax:

Use for: Request counts, event occurrences, error counts

span_to_metric_sum

Sums numerical values from spans.

Syntax:

Use for: Total duration, total bytes, cumulative values

span_to_metric_max

Tracks the maximum value observed.

Syntax:

Use for: Peak response times, largest payloads

span_to_metric_min

Tracks the minimum value observed.

Syntax:

Use for: Fastest response times, smallest payloads

Last updated