Log Pipelines

Overview

groundcover supports the configuration of log pipelines to process and customize your logs before ingestion. With log parsing pipelines, you gain full flexibility to transform data as it flows into the platform.

Log parsing in groundcover is powered by OpenTelemetry Transformation Language (OTTL), a powerful and flexible language designed for transforming telemetry data. OTTL provides a rich set of functions and operators that allow you to parse, filter, enrich, and transform your logs with precision.

What You Can Do

Log parsing pipelines give you a structured way to:

Parse - Extract structured data from unstructured log messages (JSON, GROK patterns, key-value pairs)
Filter - Drop noisy or irrelevant logs to reduce costs and focus on what matters
Obfuscate - Mask or remove sensitive data to maintain privacy and compliance
Convert to Metrics - Transform logs into time-series metrics for long-term monitoring and alerting

Each pipeline is made up of transformation steps—each step defines a specific operation like parsing JSON, extracting key-value pairs, or modifying attributes. You can configure these transformations directly in your groundcover deployment.

Parsing Playground

The Parsing Playground is an interactive testing environment where you can develop and validate parsing rules before deploying them to production.

Accessing the Playground

To access the Parsing Playground:

Navigate to the Logs view
Click on any log entry to view its details
Click the Parsing Playground button in the top right corner of the log detail view

The playground opens with the selected log pre-loaded, allowing you to write and test transformation rules in real-time.

Using the Playground

In the Parsing Playground, you can:

Write transformation rules - Create and edit parsing statements
Test in real-time - See immediate results as you modify your rules
View extracted attributes - See exactly what fields are being extracted
Validate syntax - Get instant feedback on rule syntax and errors
Add rules - Once satisfied, add the rule to deploy it in your pipelines

Always test your parsing rules in the Playground before deploying them to production. This helps catch errors and ensures your rules work as expected.

Writing Parsing Rules

Available Fields and Keys

When writing parsing rules, you have access to various fields that represent different aspects of the log. Understanding which fields are writable and which are read-only is crucial for effective rule creation.

Writable Fields

These fields can be modified in your parsing rules:

Field

Type

Description

cache

Map

Temporary storage for intermediate values. Access sub-keys with cache["key"]

l2m

Map

Logs-to-metrics data. Access sub-keys with l2m["key"]

time

Time

Timestamp of the log

body

String

The main log message content

attributes

Map

Custom attributes extracted from the log. Access sub-keys with attributes["key"]

trace_id

String

Trace identifier for distributed tracing correlation

span_id

String

Span identifier for distributed tracing

level

String

Log severity level (info, error, debug, etc.)

format

String

Log format (json, clf, unknown, etc.)

drop

Boolean

Set to true to drop/filter the log

Read-Only Fields

These fields provide context but cannot be modified. Attempting to set them will result in an error:

Field

Type

Description

workload

String

Name of the Kubernetes workload (deployment, statefulset, etc.)

source

String

Source of the log data

cluster

String

Kubernetes cluster name

env

String

Environment name

container_name

String

Container that generated the log

namespace

String

Kubernetes namespace

Special Fields

cache - Temporary Storage

The cache field is particularly useful for multi-step transformations:

statements:
  # Extract to cache first
  - 'set(cache, ExtractGrokPatterns(body, "pattern"))'
  # Process cached values
  - 'replace_pattern(cache["field"], "old", "new")'
  # Merge into attributes
  - 'merge_maps(attributes, cache, "insert")'

attributes - Custom Fields

The attributes map is where you store extracted structured data:

statements:
  - 'set(attributes["user_id"], "12345")'
  - 'set(attributes["action"], "login")'
  - 'set(attributes["ip"], "192.168.1.1")'

drop - Log Filtering

The drop field is a special boolean flag for filtering logs:

statements:
  # Drop unconditionally
  - 'set(drop, true)'
  
  # Drop conditionally
  - 'set(drop, true) where IsMatch(body, "/healthz")'

Setting Conditions

Use conditions to apply transformations only when specific attributes match. This ensures your pipeline runs efficiently and only on relevant logs.

Common fields you can use(Full List Mentioned Above):

workload – Name of the service or app
container_name – Container where the log originated
level – Log severity (e.g., info, error, debug)
format – Log format (e.g., JSON, CLF, unknown)
namespace – Kubernetes namespace
pod – Pod name

Common Functions

Some commonly used functions in groundcover pipelines:

ExtractGrokPatterns – Extract structured data using GROK patterns
ParseJSON – Parse JSON strings into structured attributes
ParseKeyValue – Parse key=value formatted strings
replace_pattern – Replace text patterns (useful for obfuscation)
delete_key – Remove specific attributes
merge_maps – Merge extracted data into attributes
set – Set attribute values
IsMatch – Match patterns in log fields

For a complete list of available OTTL functions, see the OTTL Functions Reference.

Automatically Generate Parsing Rules

From a Single Log

When using the parsing playground you can click on Suggest Configuration to ask the platform to attempt generating a parsing rule for the log that's currently loaded to the playground.

The platform will suggest a rule and apply it, demonstrating the extracted attributes. The rule can then be added to the pipeline via Add Rule .

From Multiple Patterns

groundcover can automatically generate parsing rules for you based on Log Patterns. This powerful feature analyzes the structure of your logs and creates optimized parsing rules automatically.

How It Works

The Generate Parsing Rules feature:

Analyzes log patterns in your current view
Identifies common structures and fields
Automatically generates parsing rules for each pattern
Creates rules optimized for your specific log formats

Requirements

To use Generate Parsing Rules:

Your logs must have fewer than 10 distinct patterns in the filtered view
Patterns must be clearly identifiable and consistent

If you have more than 10 patterns, apply filters (workload, namespace, level, etc.) to narrow down your logs before generating rules.

How To Generate Rules

In the Logs view, click the Actions dropdown in the top right
Select Generate Parsing Rules
Review the generated rules
Make any necessary adjustments
Deploy the rules to your pipeline

The generated rules are automatically added to your pipeline configuration and will start processing logs immediately.

The Rules Pipeline

In groundcover, all parsing rules are part of a single pipeline that processes your logs. Each rule you create becomes a step in this pipeline, executed sequentially from top to bottom. This unified pipeline approach ensures consistent processing and makes it easy to understand the flow of transformations applied to your logs.

Viewing Your Pipelines

After creating parsing rules, they are deployed to Settings → Pipelines where you can monitor their performance and effectiveness. https://app.groundcover.com/settings/pipelines

The Pipelines page provides a comprehensive view of all your log parsing rules with real-time metrics:

Pipeline Metrics

For each rule in the Pipelines page, you'll see four key metrics:

Process Rate

The throughput of logs being processed by this rule, measured in Logs/s (logs per second). This shows how many logs are actively being transformed or evaluated by the rule.

Drop Rate

The rate at which logs are being filtered out by this rule, measured in Logs/s. This metric helps you understand the effectiveness of your filtering rules in reducing log volume.

Error Rate

The percentage of logs that encounter errors during processing, shown as a percentage (%). This includes both condition evaluation errors and transformation execution errors. A healthy rule should have an error rate close to 0%.

Processing Latency

The average time it takes to process a single log through this rule, typically displayed in nanoseconds (ns) or microseconds (μs). Lower values indicate better performance.

Understanding the Metrics

The Pipelines page shows real-time statistics to help you monitor the health and performance of your parsing rules.

What Good Metrics Look Like

Process Rate:

Should match your expected log volume for rules that apply to all logs
Will be lower for rules with specific conditions
Zero process rate may indicate the rule conditions never match

Drop Rate:

For filtering rules: Should show significant volume if working correctly
For parsing rules: Should typically be 0 (unless explicitly dropping logs)
High unexpected drop rates may indicate rule errors

Error Rate:

Should be 0% or very close to it
Any error rate above 1% indicates a problem that needs investigation

High error rates indicate problems with your rule configuration. Test the rule in the Parsing Playground to identify issues.

Processing Latency:

Typically ranges from 100ns to 1ms per log
Complex parsing operations may take longer
Values consistently above 10ms indicate performance issues

If you see high processing latency, consider simplifying regex patterns or splitting complex rules into multiple steps.

Rule Order Matters

The order of parsing rules is critical! Rules are executed sequentially from top to bottom, and this order significantly impacts behavior.

Why Rule Order Is Important

Earlier rules process more logs - A rule at position 1 sees ALL logs, while a rule at position 10 only sees logs that weren't dropped by rules 1-9
Dropping affects downstream rules - If rule #3 drops 80% of logs, rule #4 will only process the remaining 20%
Transformations are cumulative - Fields extracted in rule #1 are available to rule #2 and beyond
Performance optimization - Place expensive parsing operations AFTER filter rules to process fewer logs

Best Practices for Rule Ordering

Recommended order:

Drop rules first - Filter out noisy logs before expensive parsing
Quick parsing - Fast, simple extractions
Complex parsing - Resource-intensive transformations
Obfuscation rules last - Mask sensitive data after all useful fields are extracted

Example of good ordering:

ottlRules:
  # 1. Drop health checks (filters out 60% of logs)
  - ruleName: "drop_health_checks"
    statements:
      - 'set(drop, true) where IsMatch(body, "/healthz")'
  
  # 2. Drop debug logs (filters another 20%)
  - ruleName: "drop_debug"
    conditions:
      - 'level == "debug"'
    statements:
      - 'set(drop, true)'
  
  # 3. Fast parsing for remaining 20%
  - ruleName: "extract_basic_fields"
    statements:
      - 'set(cache, ExtractGrokPatterns(body, "simple pattern"))'
  
  # 4. Complex parsing (now only processing 20% of original volume)
  - ruleName: "parse_json_payload"
    statements:
      - 'set(cache["parsed"], ParseJSON(body))'
  
  # 5. Obfuscate sensitive data
  - ruleName: "mask_emails"
    statements:
      - 'replace_pattern(attributes["email"], "pattern", "****")'

Pipeline Structure

Rules are defined as a list of steps which are executed one after another. The rules can be viewed and edited by admins in the settings tab, under "Pipelines". https://app.groundcover.com/settings/pipelines

The rules run in groundcover's sensors. This is ideal for cost savings, as the original logs are not sent to or stored in the backend. This is useful particularly when the pipeline is used to drop logs.

For the rules to take place, groundcover sensors need to be configured with Ingestion Keys that allow them to pull remote configuration from the Fleet Manager.

The pipeline is stored in yaml format and can be edited in the UI. The result yaml can be exported to be used with groundcover's terraform provider, if you prefer to use a version control system (e.g. git). Notice that the pipeline is a singleton resource, so your terraform script must only define a single one.

Each rule must have a unique and non-empty ruleName

The logs pipeline was previously editable using helm values. Rules that exist in the sensor's values are executed prior to the ones received via remote configuration, and will not be visible in UI.

Basic Structure

ottlRules:
  - ruleName: "rule1"
    conditions:
      - 'workload == "service1" or workload == "service2"'
    statements:
      - statement1
      - statement2
  - ruleName: "rule2"
    conditions:
      - 'level == "debug" or container_name == "test"'
    statements:
      - statement1
      - statement2

Required Attributes

To define a pipeline rule, make sure to include the following fields:

ruleName – Unique identifier for the rule (required)
statements – List of transformations to apply
conditions – Logic for when the rule should trigger
statementsErrorMode – How to handle errors (e.g., skip, propagate, fail)
conditionLogicOperator – Used when you define multiple conditions (and, or)

Troubleshooting Rule Issues

High error rates:

Test the rule in Parsing Playground with sample logs
Check for regex syntax errors
Verify field names exist in your logs

High processing latency:

Simplify complex regex patterns
Split multi-step rules into separate rules
Place drop rules earlier to reduce processing volume

Low condition met percentage:

Verify your conditions match actual log attributes
Check if upstream rules are dropping logs unexpectedly
Review filter conditions for typos

Unexpected drop rates:

Review the rule logic and conditions
Check if drop statements have unintended where clauses
Verify the rule order isn't causing issues

Last updated 5 hours ago