Span to Metrics

Spans to Metrics

Overview

Transform your trace data into queryable metrics for long-term monitoring, alerting, and cost-effective analysis. Spans-to-metrics extracts numerical data from span attributes and converts them into time-series metrics that can be visualized, alerted on, and retained at a fraction of the cost.

Sampling matters. For eBPF-captured traces, the pipeline runs after sampling. metrics generated from eBPF spans reflect only sampled data, not total traffic. For ingested traces, the pipeline runs before sampling - metrics reflect the full, unsampled dataset. Keep this distinction in mind when interpreting counts and sums.

Why Use Spans to Metrics?

Traces are perfect for debugging specific requests, but they become expensive and unwieldy at scale. Metrics, on the other hand, are:

Cost-effective - Store aggregated data instead of every span
Fast to query - Optimized for time-series analysis
Perfect for alerting - Track trends and thresholds over time

The Transformation

Think of it like converting request traces into spreadsheet rows:

A Span (request):

POST /api/orders — duration: 120ms, status: Ok, workload: order-service

Becomes Metrics (structured data):

Timestamp

Metric Name

Value

Labels

[now]

order_requests_total

1

workload:order-service, endpoint:/api/orders

[now]

order_request_duration_ms

120

workload:order-service, endpoint:/api/orders

You're essentially turning trace data into countable, measurable data points.

When to Use Spans to Metrics

Spans to metrics doesn't replace traces, it complements them. Use it for these scenarios:

1. Monitoring Latency Distribution

Track response time trends to understand system performance.

Use cases:

Average, minimum, and maximum response times per endpoint
Service latency degradation detection
Comparing latency across workloads or protocols

Example:

Create a metric request_duration_ns with sum, min, max, and count, then calculate average latency:

rate(request_duration_ns_sum) / rate(request_duration_ns_count)

2. Tracking Error Ratios

Derive error rates by comparing error span counts to total span counts.

Use cases:

Error ratio per service or endpoint
Detecting degradation trends over time
Protocol-level error comparison (HTTP vs gRPC)

Example:

Create separate count metrics for all spans and error spans, then compute the ratio:

rate(span_errors_total) / rate(span_requests_total)

3. Enrichment-Based Metrics

Generate metrics from data you've extracted or enriched in earlier pipeline rules — such as parsed response fields, custom attributes, or header values.

Use cases:

Metrics derived from JSON response body fields (e.g. cache["order_total"])
Counting spans by custom attributes set in transform rules
Aggregating values extracted from request/response headers

Example:

After a transform rule parses order_total from the response body into cache, create a sum metric:

- 'span_to_metric_sum("order_total_amount", s2m, Double(cache["order_total"]))'

4. Counting Ingested (OTel) Request Volume

For ingested traces only (not eBPF), you can count absolute request volume because the pipeline runs before sampling.

Use cases:

Total API request counts per endpoint
Service-to-service call frequency
Request volume per protocol type

Absolute counts from eBPF spans reflect sampled data only. Use them for relative comparisons and trend analysis, not for exact volume measurement.

How Spans-to-Metrics Works

Spans-to-metrics uses the special s2m map to define what metrics to create and how to aggregate them.

Available Operations

groundcover supports four metric aggregation operations:

Function

Description

Use Case

span_to_metric_count

Count spans matching criteria

Request counts, event occurrences

span_to_metric_sum

Sum extracted values

Total duration, total payload size

span_to_metric_max

Maximum value observed

Peak response time, largest payload

span_to_metric_min

Minimum value observed

Fastest response time, smallest payload

groundcover automatically adds a _gc_op suffix with the operation type to generated metrics (e.g., _sum, _min, _max, _count).

Basic Structure

ottlRules:
  - ruleName: s2m-example
    conditions:
      - workload == "my-service"
    statements:
      # 1. Define metric labels in s2m map
      - set(s2m["label_name"], attributes["field"])
      
      # 2. Create metrics with aggregations
      - span_to_metric_count("metric_name", s2m)
      - span_to_metric_sum("metric_name", s2m, Double(attributes["value"]))

Best Practices

Choose meaningful metric names - Use descriptive names that indicate what's being measured
- Good: http_requests_total, order_processing_duration
- Bad: metric1, counter
Use appropriate labels - Add dimensions that help you slice and dice the data
- Common labels: workload, endpoint, protocol_type, status, namespace
- Avoid high-cardinality labels (unique trace IDs, user IDs, span IDs)
Use conditions to scope rules - Only generate metrics from relevant spans to minimize overhead
Combine operations - Use count, sum, min, and max together for comprehensive insights
- Count requests + sum duration = average latency
- Min/max provide performance bounds
Use type conversion - Always convert values to Double() for sum/min/max operations
- Double(attributes["duration"]) not attributes["duration"]
Prefer ratios over absolute counts for eBPF - Since eBPF spans are sampled, ratios (e.g. error rate) are more reliable than raw counts

Viewing Your Metrics

After creating spans-to-metrics rules:

Metrics appear in Metrics Explorer within minutes
Use PromQL to query your custom metrics
Create dashboards to visualize trends
Set up monitors for alerting on thresholds

Common Use Cases

Tracking Request Duration

Monitor response times with min, max, and sum. Duration metrics are accurate even on sampled eBPF data.

ottlRules:
  - ruleName: s2m-request-duration
    conditions:
      - workload == "api-gateway"
      - protocol_type == "http"
    conditionLogicOperator: "and"
    statements:
      - set(s2m["endpoint"], span_name)
      - set(s2m["workload"], workload)
      - span_to_metric_sum("request_duration_ns", s2m, Double(duration))
      - span_to_metric_max("request_duration_ns", s2m, Double(duration))
      - span_to_metric_min("request_duration_ns", s2m, Double(duration))
      - span_to_metric_count("request_duration_ns", s2m)

Output metrics:

request_duration_ns_sum{endpoint="GET /api/users", workload="api-gateway"} = 165000000
request_duration_ns_max{endpoint="GET /api/users", workload="api-gateway"} = 120000000
request_duration_ns_min{endpoint="GET /api/users", workload="api-gateway"} = 45000000
request_duration_ns_count{endpoint="GET /api/users", workload="api-gateway"} = 2

💡 What it does: Tracks latency distribution per endpoint. Calculate average latency with rate(sum) / rate(count).

Tracking Error Ratios

Compare error spans to total spans for reliable error rate monitoring.

ottlRules:
  - ruleName: s2m-all-requests
    conditions:
      - protocol_type == "http"
    statements:
      - set(s2m["workload"], workload)
      - set(s2m["namespace"], namespace)
      - span_to_metric_count("http_spans_total", s2m)

  - ruleName: s2m-error-requests
    conditions:
      - protocol_type == "http"
      - status == "Error"
    conditionLogicOperator: "and"
    statements:
      - set(s2m["workload"], workload)
      - set(s2m["namespace"], namespace)
      - span_to_metric_count("http_spans_errors", s2m)

💡 What it does: Creates two count metrics. Calculate error rate with rate(http_spans_errors) / rate(http_spans_total). This ratio is reliable even on sampled eBPF data.

Monitoring Payload Size

Track request and response body sizes from span attributes.

ottlRules:
  - ruleName: s2m-payload-size
    conditions:
      - workload == "api-gateway"
      - attributes["response_size"] != nil
    conditionLogicOperator: "and"
    statements:
      - set(s2m["endpoint"], span_name)
      - set(s2m["workload"], workload)
      - span_to_metric_sum("response_bytes", s2m, Double(attributes["response_size"]))
      - span_to_metric_max("response_bytes", s2m, Double(attributes["response_size"]))
      - span_to_metric_min("response_bytes", s2m, Double(attributes["response_size"]))
      - span_to_metric_count("response_bytes", s2m)

💡 What it does: Tracks payload size distribution. Min/max values are accurate regardless of sampling.

Key Functions

s2m Map

The s2m map stores the labels (dimensions) for your metrics.

- set(s2m["label_name"], "value")
- set(s2m["endpoint"], span_name)
- set(s2m["workload"], workload)

span_to_metric_count

Counts the number of spans matching the rule.

Syntax:

- span_to_metric_count("metric_name", s2m)

Use for: Request counts, event occurrences, error counts

span_to_metric_sum

Sums numerical values from spans.

Syntax:

- span_to_metric_sum("metric_name", s2m, Double(attributes["value"]))

Use for: Total duration, total bytes, cumulative values

span_to_metric_max

Tracks the maximum value observed.

Syntax:

- span_to_metric_max("metric_name", s2m, Double(attributes["value"]))

Use for: Peak response times, largest payloads

span_to_metric_min

Tracks the minimum value observed.

Syntax:

- span_to_metric_min("metric_name", s2m, Double(attributes["value"]))

Use for: Fastest response times, smallest payloads

Last updated 20 days ago

hashtagSpans to Metrics

hashtagOverview

hashtagWhy Use Spans to Metrics?

hashtagWhen to Use Spans to Metrics

hashtagHow Spans-to-Metrics Works

hashtagBest Practices

hashtagViewing Your Metrics

hashtagCommon Use Cases

hashtagKey Functions

Spans to Metrics

Overview

Why Use Spans to Metrics?

When to Use Spans to Metrics

How Spans-to-Metrics Works

Best Practices

Viewing Your Metrics

Common Use Cases

Key Functions