Logs-to-Metrics

Overview

Transform your log data into queryable metrics for long-term monitoring, alerting, and cost-effective analysis. Logs-to-metrics parsing extracts numerical data from unstructured log messages and converts them into time-series metrics that can be visualized, alerted on, and retained at a fraction of the cost.

Why Use Logs-to-Metrics?

Logs are perfect for debugging specific events, but they become expensive and unwieldy at scale. Metrics, on the other hand, are:

  • Cost-effective - Store aggregated data instead of every log line

  • Fast to query - Optimized for time-series analysis

  • Perfect for alerting - Track trends and thresholds over time

The Transformation

Think of it like converting sentences into spreadsheet rows:

A Log (sentence):

INFO: HTTP GET /api/users request completed in 55ms with status 200

Becomes Metrics (structured data):

Timestamp
Metric Name
Value
Labels

[now]

http_requests_total

1

method:GET, endpoint:/api/users, status:200

[now]

http_request_duration_ms

55

method:GET, endpoint:/api/users, status:200

You're essentially turning descriptive text into countable, measurable data points.

When to Use Logs-to-Metrics

Logs-to-metrics doesn't replace logs—it complements them. Use it for these scenarios:

1. Counting Business Events

Track how often specific business or application events occur.

Use cases:

  • User logins, registrations, or logouts

  • Payment transactions (successful, failed, pending)

  • Items added to cart or checkout completions

  • Feature usage or API endpoint calls

Example:

Create a metric user_logins_total by counting every log containing "User successfully authenticated."

2. Monitoring Error Rates

Track error frequency to understand system health and set up alerts.

Use cases:

  • Application ERROR or FATAL level logs

  • Failed database queries or timeout errors

  • Service degradation indicators

Example:

Create a metric requests_total with status code labels, then calculate error rate:

rate(requests_total{status="500"}) / rate(requests_total)

3. Monitoring Legacy or Third-Party Applications

Extract metrics from applications you can't modify to add instrumentation.

Use cases:

  • Legacy systems that only write to log files

  • Third-party tools without native metrics

  • Vendor applications without metric exporters

  • Containerized apps without metric endpoints

Example:

Parse log files from an old Java application to extract performance indicators like active connections or tasks processed.

How Logs-to-Metrics Works

Logs-to-metrics parsing uses the special l2m map to define what metrics to create and how to aggregate them.

Available Operations

groundcover supports four metric aggregation operations:

Function
Description
Use Case

log_to_metric_count

Count logs matching criteria

Request counts, event occurrences

log_to_metric_sum

Sum extracted values

Total bytes transferred, total duration

log_to_metric_max

Maximum value observed

Peak response time, largest payload

log_to_metric_min

Minimum value observed

Fastest response time, smallest payload

groundcover automatically adds a _gc_op suffix with the operation type to generated metrics (e.g., _sum, _min, _max, _count).

Basic Structure

ottlRules:
  - ruleName: "l2m-example"
    conditions:
      - 'workload == "my-service"'
    statements:
      # 1. Define metric labels in l2m map
      - 'set(l2m["label_name"], attributes["field"])'
      
      # 2. Create metrics with aggregations
      - 'log_to_metric_count("metric_name", l2m)'
      - 'log_to_metric_sum("metric_name", l2m, Double(attributes["value"]))'

Best Practices

  1. Choose meaningful metric names - Use descriptive names that indicate what's being measured

    • Good: http_requests_total, payment_transactions_count

    • Bad: metric1, counter

  2. Use appropriate labels - Add dimensions that help you slice and dice the data

    • Common labels: endpoint, method, status, region, user_type

    • Avoid high-cardinality labels (user IDs, unique transaction IDs)

  3. Parse before creating metrics - Extract and clean data in earlier rules, then create metrics

    • Parse JSON or GROK patterns first

    • Create l2m metrics in a separate rule

  4. Combine operations - Use count, sum, min, and max together for comprehensive insights

    • Count requests + sum duration = average latency

    • Min/max provide performance bounds

  5. Use type conversion - Always convert values to Double() for sum/min/max operations

    • Double(attributes["duration"]) not attributes["duration"]

  6. Test with Parsing Playground - Verify your l2m rules extract the right data before deploying

Viewing Your Metrics

After creating logs-to-metrics rules:

  1. Metrics appear in Metrics Explorer within minutes

  2. Use PromQL to query your custom metrics

  3. Create dashboards to visualize trends

  4. Set up monitors for alerting on thresholds

Common Use Cases

Tracking Request Duration

Monitor response times with min, max, and sum.

ottlRules:
  - ruleName: "l2m-request-duration"
    conditions:
      - 'workload == "api-gateway"'
    statements:
      - 'set(l2m["endpoint"], attributes["endpoint"])'
      - 'set(l2m["method"], attributes["method"])'
      - 'log_to_metric_sum("request_duration_ms", l2m, Double(attributes["duration"]))'
      - 'log_to_metric_max("request_duration_ms", l2m, Double(attributes["duration"]))'
      - 'log_to_metric_min("request_duration_ms", l2m, Double(attributes["duration"]))'
      - 'log_to_metric_count("request_duration_ms", l2m)'

Input logs:

GET /api/users - duration: 45ms
GET /api/users - duration: 120ms
POST /api/orders - duration: 89ms

Output metrics:

request_duration_ms_sum{endpoint="/api/users", method="GET"} = 165
request_duration_ms_max{endpoint="/api/users", method="GET"} = 120
request_duration_ms_min{endpoint="/api/users", method="GET"} = 45
request_duration_ms_count{endpoint="/api/users", method="GET"} = 2

Monitoring Data Transfer Volume

Track bytes sent/received.

ottlRules:
  - ruleName: "l2m-data-transfer"
    conditions:
      - 'container_name == "proxy"'
    statements:
      - 'set(l2m["request_path"], attributes["request.path"])'
      - 'set(l2m["cluster"], cluster)'
      - 'set(l2m["status_code"], attributes["status_code"])'
      - 'log_to_metric_sum("request_bytes", l2m, Double(attributes["bytes"]))'
      - 'log_to_metric_max("request_bytes", l2m, Double(attributes["bytes"]))'
      - 'log_to_metric_min("request_bytes", l2m, Double(attributes["bytes"]))'
      - 'log_to_metric_count("request_bytes", l2m)'

Input logs:

10.1.139.127 - [07/Aug/2025:12:07:00 +0000] "GET /api/data HTTP/2.0" 403 19
10.1.139.127 - [07/Aug/2025:12:08:00 +0000] "GET /api/data HTTP/2.0" 403 189
10.1.139.127 - [07/Aug/2025:12:10:00 +0000] "GET /api/data HTTP/2.0" 403 6

Output metrics:

request_bytes_sum = 214
request_bytes_max = 189
request_bytes_min = 6
request_bytes_count = 3

Counting Business Events

Track user actions and business metrics.

ottlRules:
  - ruleName: "l2m-user-actions"
    conditions:
      - 'workload == "user-service"'
    statements:
      - 'set(l2m["action"], attributes["action"])'
      - 'set(l2m["user_type"], attributes["user_type"])'
      - 'set(l2m["region"], attributes["region"])'
      - 'log_to_metric_count("user_actions", l2m)'

Input logs:

User action: login, type: premium, region: us-east
User action: purchase, type: premium, region: us-east
User action: login, type: free, region: eu-west

Output metrics:

user_actions_count{action="login", user_type="premium", region="us-east"} = 1
user_actions_count{action="purchase", user_type="premium", region="us-east"} = 1
user_actions_count{action="login", user_type="free", region="eu-west"} = 1

Complete Example

Here's a comprehensive example parsing Kong access logs:

ottlRules:
  - ruleName: "l2m-kong"
    conditions:
      - 'container_name == "proxy"'
      - 'workload == "groundcover-incloud-kong"'
    conditionLogicOperator: "and"
    statements:
      # Define labels
      - 'set(l2m["request_path"], attributes["request.path"])'
      - 'set(l2m["cluster"], cluster)'
      - 'set(l2m["status_code"], attributes["status_code"])'
      
      # Create metrics for request volume (bytes)
      - 'log_to_metric_sum("kong_request_volume", l2m, Double(attributes["bytes"]))'
      - 'log_to_metric_min("kong_request_volume", l2m, Double(attributes["bytes"]))'
      - 'log_to_metric_max("kong_request_volume", l2m, Double(attributes["bytes"]))'
      
      # Count access logs
      - 'log_to_metric_count("kong_access_log_metrics", l2m)'
    statementsErrorMode: "propagate"

Input logs:

10.1.139.127 - - [07/Aug/2025:12:07:00 +0000] "GET /fleet-manager/api/client/config HTTP/2.0" 403 19 "-" "Go-http-client/2.0"
10.1.139.127 - - [07/Aug/2025:12:08:00 +0000] "GET /fleet-manager/api/client/config HTTP/2.0" 403 189 "-" "Go-http-client/2.0"
10.1.139.127 - - [07/Aug/2025:12:10:00 +0000] "GET /fleet-manager/api/client/config HTTP/2.0" 403 6 "-" "Go-http-client/2.0"

Output metrics:

Metric
Value

kong_request_volume_sum_gc_op

214

kong_request_volume_min_gc_op

6

kong_request_volume_max_gc_op

189

kong_access_log_metrics_count_gc_op

3

These metrics are now available in the Metrics Explorer for querying, visualization, and alerting!

Key Functions

l2m Map

The l2m map stores the labels (dimensions) for your metrics.

- 'set(l2m["label_name"], "value")'
- 'set(l2m["endpoint"], attributes["path"])'
- 'set(l2m["cluster"], cluster)'

log_to_metric_count

Counts the number of logs matching the rule.

Syntax:

- 'log_to_metric_count("metric_name", l2m)'

Use for: Request counts, event occurrences, error counts

log_to_metric_sum

Sums numerical values from logs.

Syntax:

- 'log_to_metric_sum("metric_name", l2m, Double(attributes["value"]))'

Use for: Total bytes, total duration, cumulative values

log_to_metric_max

Tracks the maximum value observed.

Syntax:

- 'log_to_metric_max("metric_name", l2m, Double(attributes["value"]))'

Use for: Peak response times, largest payloads

log_to_metric_min

Tracks the minimum value observed.

Syntax:

- 'log_to_metric_min("metric_name", l2m, Double(attributes["value"]))'

Use for: Fastest response times, smallest payloads

Last updated