# Logs Pipeline

## Overview

groundcover supports the configuration of log pipelines to process and customize your logs before ingestion. With log parsing pipelines, you gain full flexibility to transform data as it flows into the platform.

Log parsing in groundcover is powered by [OpenTelemetry Transformation Language (OTTL)](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/pkg/ottl), a powerful and flexible language designed for transforming telemetry data. OTTL provides a rich set of functions and operators that allow you to parse, filter, enrich, and transform your logs with precision.

#### What You Can Do

Log parsing pipelines give you a structured way to:

* **Parse** - Extract structured data from unstructured log messages (JSON, GROK patterns, key-value pairs)
* **Filter** - Drop noisy or irrelevant logs to reduce costs and focus on what matters
* **Obfuscate** - Mask or remove sensitive data to maintain privacy and compliance
* **Convert to Metrics** - Transform logs into time-series metrics for long-term monitoring and alerting

Each pipeline is made up of transformation steps—each step defines a specific operation like parsing JSON, extracting key-value pairs, or modifying attributes. You can configure these transformations directly in your groundcover deployment.

## Parsing Playground

<figure><img src="/files/5VF2ZmEoW2yw7KErq5F5" alt=""><figcaption></figcaption></figure>

The **Parsing Playground** is an interactive testing environment where you can develop and validate parsing rules before deploying them to production.

#### Accessing the Playground

To access the Parsing Playground:

1. Navigate to the **Logs** view
2. Click on any log entry to view its details
3. Click the **Parsing Playground** button in the top right corner of the log detail view

The playground opens with the selected log pre-loaded, allowing you to write and test transformation rules in real-time.

#### Using the Playground

In the Parsing Playground, you can:

* **Write transformation rules** - Create and edit parsing statements
* **Test in real-time** - See immediate results as you modify your rules
* **View extracted attributes** - See exactly what fields are being extracted
* **Validate syntax** - Get instant feedback on rule syntax and errors
* **Add rules** - Once satisfied, add the rule to deploy it in your pipelines

{% hint style="success" %}
Always test your parsing rules in the Playground before deploying them to production. This helps catch errors and ensures your rules work as expected.
{% endhint %}

## Writing Parsing Rules

### Available Fields and Keys

When writing parsing rules, you have access to various fields that represent different aspects of the log. Understanding which fields are writable and which are read-only is crucial for effective rule creation.

#### Writable Fields

These fields can be modified in your parsing rules:

| Field        | Type    | Description                                                                        |
| ------------ | ------- | ---------------------------------------------------------------------------------- |
| `cache`      | Map     | Temporary storage for intermediate values. Access sub-keys with `cache["key"]`     |
| `l2m`        | Map     | Logs-to-metrics data. Access sub-keys with `l2m["key"]`                            |
| `time`       | Time    | Timestamp of the log                                                               |
| `body`       | String  | The main log message content                                                       |
| `attributes` | Map     | Custom attributes extracted from the log. Access sub-keys with `attributes["key"]` |
| `trace_id`   | String  | Trace identifier for distributed tracing correlation                               |
| `span_id`    | String  | Span identifier for distributed tracing                                            |
| `level`      | String  | Log severity level (info, error, debug, etc.)                                      |
| `format`     | String  | Log format (json, clf, unknown, etc.)                                              |
| `drop`       | Boolean | Set to `true` to drop/filter the log                                               |

#### Read-Only Fields

These fields provide context but **cannot be modified**. Attempting to set them will result in an error:

| Field            | Type   | Description                                                     |
| ---------------- | ------ | --------------------------------------------------------------- |
| `workload`       | String | Name of the Kubernetes workload (deployment, statefulset, etc.) |
| `source`         | String | Source of the log data                                          |
| `cluster`        | String | Kubernetes cluster name                                         |
| `env`            | String | Environment name                                                |
| `container_name` | String | Container that generated the log                                |
| `namespace`      | String | Kubernetes namespace                                            |

#### Special Fields

**cache** - Temporary Storage

The `cache` field is particularly useful for multi-step transformations:

```yaml
statements:
  # Extract to cache first
  - 'set(cache, ExtractGrokPatterns(body, "pattern"))'
  # Process cached values
  - 'replace_pattern(cache["field"], "old", "new")'
  # Merge into attributes
  - 'merge_maps(attributes, cache, "insert")'
```

**attributes** - Custom Fields

The `attributes` map is where you store extracted structured data:

```yaml
statements:
  - 'set(attributes["user_id"], "12345")'
  - 'set(attributes["action"], "login")'
  - 'set(attributes["ip"], "192.168.1.1")'
```

**drop** - Log Filtering

The `drop` field is a special boolean flag for filtering logs:

```yaml
statements:
  # Drop unconditionally
  - 'set(drop, true)'
  
  # Drop conditionally
  - 'set(drop, true) where IsMatch(body, "/healthz")'
```

#### Setting Conditions

Use `conditions` to apply transformations only when specific attributes match. This ensures your pipeline runs efficiently and only on relevant logs.

Common fields you can use([Full List Mentioned Above](#available-fields-and-keys)):

* `workload` – Name of the service or app
* `container_name` – Container where the log originated
* `level` – Log severity (e.g., info, error, debug)
* `format` – Log format (e.g., JSON, CLF, unknown)
* `namespace` – Kubernetes namespace
* `pod` – Pod name\\

#### Common Functions

Some commonly used functions in groundcover pipelines:

* `ExtractGrokPatterns` – Extract structured data using GROK patterns
* `ParseJSON` – Parse JSON strings into structured attributes
* `ParseKeyValue` – Parse key=value formatted strings
* `replace_pattern` – Replace text patterns (useful for obfuscation)
* `delete_key` – Remove specific attributes
* `merge_maps` – Merge extracted data into attributes
* `set` – Set attribute values
* `IsMatch` – Match patterns in log fields
* `multiline_merge` – Merge multiline log entries
* `obfuscate_pii` – Detects and redacts sensitive data

For a complete list of available OTTL functions, see the [OTTL Functions Reference](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/ottl/ottlfuncs/README.md).

## Automatically Generate Parsing Rules

### From a Single Log

When using the parsing playground you can click on `Suggest Configuration` to ask the platform to attempt generating a parsing rule for the log that's currently loaded to the playground.

The platform will suggest a rule and apply it, demonstrating the extracted attributes. The rule can then be added to the pipeline via `Add Rule` .

<figure><img src="/files/jgVKz9eaBlvCdykT9knI" alt=""><figcaption></figcaption></figure>

### From Multiple Patterns

<figure><img src="/files/3Z86r2O4nLgsdeaw0mA3" alt=""><figcaption></figcaption></figure>

groundcover can automatically generate parsing rules for you based on Log Patterns. This powerful feature analyzes the structure of your logs and creates optimized parsing rules automatically.

#### How It Works

The **Generate Parsing Rules** feature:

1. Analyzes log patterns in your current view
2. Identifies common structures and fields
3. Automatically generates parsing rules for each pattern
4. Creates rules optimized for your specific log formats

#### Requirements

To use Generate Parsing Rules:

* Your logs must have **fewer than 10 distinct patterns** in the filtered view
* Patterns must be clearly identifiable and consistent

{% hint style="info" %}
If you have more than 10 patterns, apply filters (workload, namespace, level, etc.) to narrow down your logs before generating rules.
{% endhint %}

#### How To Generate Rules

1. In the Logs view, click the **Actions** dropdown in the top right
2. Select **Generate Parsing Rules**
3. Review the generated rules
4. Make any necessary adjustments
5. Deploy the rules to your pipeline

The generated rules are automatically added to your pipeline configuration and will start processing logs immediately.

## The Rules Pipeline

<figure><img src="/files/T5YqFAfraRHmvcwg5EyH" alt=""><figcaption></figcaption></figure>

In groundcover, all parsing rules are part of a **single pipeline** that processes your logs. Each rule you create becomes a step in this pipeline, executed sequentially from top to bottom. This unified pipeline approach ensures consistent processing and makes it easy to understand the flow of transformations applied to your logs.

#### Viewing Your Pipelines

After creating parsing rules, they are deployed to **Settings → Pipelines** where you can monitor their performance and effectiveness.\
<https://app.groundcover.com/settings/pipelines>

The Pipelines page provides a comprehensive view of all your log parsing rules with real-time metrics:

#### Pipeline Metrics

For each rule in the Pipelines page, you'll see four key metrics:

**Process Rate**

The throughput of logs being processed by this rule, measured in **Logs/s** (logs per second). This shows how many logs are actively being transformed or evaluated by the rule.

**Drop Rate**

The rate at which logs are being filtered out by this rule, measured in **Logs/s**. This metric helps you understand the effectiveness of your filtering rules in reducing log volume.

**Error Rate**

The percentage of logs that encounter errors during processing, shown as a **percentage (%)**. This includes both condition evaluation errors and transformation execution errors. A healthy rule should have an error rate close to 0%.

**Processing Latency**

The average time it takes to process a single log through this rule, typically displayed in **nanoseconds (ns)** or **microseconds (μs)**. Lower values indicate better performance.

### Understanding the Metrics

The Pipelines page shows real-time statistics to help you monitor the health and performance of your parsing rules.

#### What Good Metrics Look Like

**Process Rate:**

* Should match your expected log volume for rules that apply to all logs
* Will be lower for rules with specific conditions
* Zero process rate may indicate the rule conditions never match

**Drop Rate:**

* For filtering rules: Should show significant volume if working correctly
* For parsing rules: Should typically be 0 (unless explicitly dropping logs)
* High unexpected drop rates may indicate rule errors

**Error Rate:**

* Should be **0%** or very close to it
* Any error rate above **1%** indicates a problem that needs investigation

{% hint style="danger" %}
High error rates indicate problems with your rule configuration. Test the rule in the Parsing Playground to identify issues.
{% endhint %}

**Processing Latency:**

* Typically ranges from **100ns to 1ms** per log
* Complex parsing operations may take longer
* Values consistently above **10ms** indicate performance issues

{% hint style="info" %}
If you see high processing latency, consider simplifying regex patterns or splitting complex rules into multiple steps.
{% endhint %}

#### Rule Order Matters

{% hint style="warning" %}
**The order of parsing rules is critical!** Rules are executed sequentially from top to bottom, and this order significantly impacts behavior.
{% endhint %}

**Why Rule Order Is Important**

1. **Earlier rules process more logs** - A rule at position 1 sees ALL logs, while a rule at position 10 only sees logs that weren't dropped by rules 1-9
2. **Dropping affects downstream rules** - If rule #3 drops 80% of logs, rule #4 will only process the remaining 20%
3. **Transformations are cumulative** - Fields extracted in rule #1 are available to rule #2 and beyond
4. **Performance optimization** - Place expensive parsing operations AFTER filter rules to process fewer logs

**Best Practices for Rule Ordering**

**Recommended order:**

1. **Drop rules first** - Filter out noisy logs before expensive parsing
2. **Quick parsing** - Fast, simple extractions
3. **Complex parsing** - Resource-intensive transformations
4. **Obfuscation rules last** - Mask sensitive data after all useful fields are extracted

**Example of good ordering:**

```yaml
ottlRules:
  # 1. Drop health checks (filters out 60% of logs)
  - ruleName: "drop_health_checks"
    statements:
      - 'set(drop, true) where IsMatch(body, "/healthz")'
  
  # 2. Drop debug logs (filters another 20%)
  - ruleName: "drop_debug"
    conditions:
      - 'level == "debug"'
    statements:
      - 'set(drop, true)'
  
  # 3. Fast parsing for remaining 20%
  - ruleName: "extract_basic_fields"
    statements:
      - 'set(cache, ExtractGrokPatterns(body, "simple pattern"))'
  
  # 4. Complex parsing (now only processing 20% of original volume)
  - ruleName: "parse_json_payload"
    statements:
      - 'set(cache["parsed"], ParseJSON(body))'
  
  # 5. Obfuscate sensitive data
  - ruleName: "mask_emails"
    statements:
      - 'replace_pattern(attributes["email"], "pattern", "****")'
```

### Pipeline Structure

Rules are defined as a list of steps which are executed one after another. The rules can be viewed and edited by admins in the settings tab, under "Pipelines".\
<https://app.groundcover.com/settings/pipelines>

The rules run in groundcover's sensors. This is ideal for cost savings, as the original logs are not sent to or stored in the backend. This is useful particularly when the pipeline is used to drop logs.

{% hint style="info" %}
For the rules to take place, groundcover sensors need to be configured with Ingestion Keys that allow them to pull remote configuration from the Fleet Manager.
{% endhint %}

The pipeline is stored in `yaml` format and can be edited in the UI. The result `yaml` can be exported to be used with groundcover's terraform provider, if you prefer to use a version control system (e.g. `git`). Notice that the pipeline is a singleton resource, so your terraform script must only define a single one.

{% hint style="info" %}
Each rule must have a unique and non-empty `ruleName`
{% endhint %}

{% hint style="warning" %}
The logs pipeline was previously editable using helm values. Rules that exist in the sensor's values are executed prior to the ones received via remote configuration, and will not be visible in UI.
{% endhint %}

#### Basic Structure

```yaml
ottlRules:
  - ruleName: "rule1"
    conditions:
      - 'workload == "service1" or workload == "service2"'
    statements:
      - statement1
      - statement2
  - ruleName: "rule2"
    conditions:
      - 'level == "debug" or container_name == "test"'
    statements:
      - statement1
      - statement2
```

#### Required Attributes

To define a pipeline rule, make sure to include the following fields:

* `ruleName` – Unique identifier for the rule (required)
* `statements` – List of transformations to apply
* `conditions` – Logic for when the rule should trigger
* `statementsErrorMode` – How to handle errors (e.g., `skip`, `propagate`, `fail`)
* `conditionLogicOperator` – Used when you define multiple conditions (`and`, `or`)

#### Troubleshooting Rule Issues

**High error rates:**

* Test the rule in Parsing Playground with sample logs
* Check for regex syntax errors
* Verify field names exist in your logs

**High processing latency:**

* Simplify complex regex patterns
* Split multi-step rules into separate rules
* Place drop rules earlier to reduce processing volume

**Low condition met percentage:**

* Verify your conditions match actual log attributes
* Check if upstream rules are dropping logs unexpectedly
* Review filter conditions for typos

**Unexpected drop rates:**

* Review the rule logic and conditions
* Check if drop statements have unintended `where` clauses
* Verify the rule order isn't causing issues


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.groundcover.com/use-groundcover/data-pipelines/log-pipelines.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.