# Merging Multiline Logs

#### Overview

Many applications write **multiline logs**: stack traces, pretty-printed JSON, or human-readable errors split across several stdout/stderr lines. Kubernetes and most log collectors treat each line as a separate log record by default, which breaks search, correlation, and parsing.

Use the OTTL function **`multiline_merge`** in the groundcover logs pipeline to **reassemble** continuation lines into a single log record whose `body` contains embedded newlines.

For the broader parsing workflow and other OTTL examples, see Parsing Logs. This page focuses on multiline behavior, common first-line patterns, and how to scope merge rules.

#### What Are Multiline Logs?

A **multiline log** is one logical event that spans multiple physical lines. Common cases:

* Stack traces (Java, Python, Node.js, Go)
* Pretty-printed or indented JSON
* System messages where only the first line carries a timestamp

Each line is still one line of text; the issue is **ingestion boundaries**, not the application binary.

#### Why Merge Multiline Logs?

Without merging:

* Stack frames appear as unrelated log rows
* Only the first line may match time or level parsers
* JSON split across lines cannot be parsed as a single object
* Noise increases (many low-value rows per incident)

After **`multiline_merge`**, one record represents the full event, so downstream parsing, `ExtractGrokPatterns`, JSON parsing, and log-based metrics behave as intended.

#### How `multiline_merge` Works

`multiline_merge` takes a **first-line regex**. Any line **matching** the pattern starts a **new** buffered block. Lines that **do not** match are treated as **continuations** of the current block until the buffer flushes (see Parsing Logs — multiline\_merge).

**Important**

* Put **`multiline_merge` first** in the rule’s `statements` list so later statements see the merged `body` when the buffer flushes.
* **Conditions must include continuation lines** — for example scope by `workload` or `namespace`, not by a pattern that matches only the first line of a stack trace. See Parsing Logs.

Optional arguments:

| Argument             | Meaning                                    | Default      |
| -------------------- | ------------------------------------------ | ------------ |
| `first_line_pattern` | Regex for the **start** of a new block     | *(required)* |
| `max_lines`          | Flush after this many lines in one block   | `128`        |
| `max_time`           | Max wait for continuation (e.g. `"800ms"`) | `"800ms"`    |

#### Common First-Line Patterns

The exact regex depends on how the **first** line of each logical event is formatted. Below are common anchors. Escape backslashes in YAML strings (e.g. `^\\d{4}`).

**Java Stack Traces**

Often the first line includes a timestamp and level, then the exception:

**Before merge (three separate ingested lines)**

```json
{ "body": "2026-01-15 10:30:00 ERROR java.lang.NullPointerException: user is null" }
```

```json
{ "body": "  at com.example.UserService.getUser(UserService.java:42)" }
```

```json
{ "body": "  at com.example.ApiHandler.handle(ApiHandler.java:128)" }
```

**Example rule (same style as Parsing Logs)**

```yaml
ottlRules:
  - ruleName: "java-stacktrace"
    conditions:
      - 'workload == "my-java-app"'
    statements:
      - 'multiline_merge("^\\d{4}-\\d{2}-\\d{2}")'
```

**After merge (one record)**

```json
{
  "body": "2026-01-15 10:30:00 ERROR java.lang.NullPointerException: user is null\n  at com.example.UserService.getUser(UserService.java:42)\n  at com.example.ApiHandler.handle(ApiHandler.java:128)"
}
```

If the first line is ISO-8601 with `T`, tighten the pattern, for example:

* `'multiline_merge("^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}")'`

**Python Tracebacks**

The block usually starts with `Traceback (most recent call last):`:

```yaml
ottlRules:
  - ruleName: "python-traceback"
    conditions:
      - 'workload == "my-python-worker"'
    statements:
      - 'multiline_merge("^Traceback \\(most recent call last\\):")'
```

**Go Panics**

Typical first line: `panic: ...` or `fatal error: ...`:

```yaml
ottlRules:
  - ruleName: "go-panic"
    conditions:
      - 'workload == "my-go-service"'
    statements:
      - 'multiline_merge("^(panic:|fatal error:)")'
```

Adjust if your runtime prints a different leader (for example a timestamp prefix on the same line).

**Node.js Error Stacks**

Often `Error:` or `TypeError:` on the first line, then `at ...` lines:

```yaml
ottlRules:
  - ruleName: "nodejs-stack"
    conditions:
      - 'workload == "my-node-api"'
    statements:
      - 'multiline_merge("^[A-Za-z]*Error:\\s")'
```

Validate in the Parsing Playground: some stacks start with `UnhandledPromiseRejectionWarning` or custom prefixes — prefer the most stable prefix your app emits.

**Multiline JSON**

If each event starts with `{` or `[` on its own line and continuation lines are indented or additional JSON lines, a simple anchor is the opening brace at **column 1**:

```yaml
ottlRules:
  - ruleName: "multiline-json"
    conditions:
      - 'workload == "batch-exporter"'
    statements:
      - 'multiline_merge("^\\{")'
```

This can collide with single-line JSON logs that also start with `{`. Narrow with **`workload`**, **`namespace`**, or a more specific prefix (for example a known `{"level":` header).

**Timestamp-Prefixed Logs**

Any scheme where **only new events** begin with a timestamp works well:

```yaml
ottlRules:
  - ruleName: "timestamp-prefixed"
    conditions:
      - 'workload == "my-service"'
    statements:
      - 'multiline_merge("^\\d{4}-\\d{2}-\\d{2}[ T]\\d{2}:\\d{2}:\\d{2}")'
```

Tune for comma versus dot milliseconds and timezone tokens.

**Bracket-Prefixed Leaders**

Some services print a **tag line** (for example `[INFO]`, `[ERROR]`, `=== summary`) and continue with indented lines that are **not** timestamp-prefixed. A timestamp-only `multiline_merge` pattern **never** treats those tag lines as a new block, so each continuation line becomes its own log row.

Use a first-line pattern that matches those leaders, for example:

```yaml
ottlRules:
  - ruleName: "bracket-leader"
    conditions:
      - 'workload == "my-app"'
    statements:
      - 'multiline_merge("^(\\[[A-Z]+\\]|===)", 128, "2s")'
```

Escape backslashes as required by your YAML layer — the pipeline editor and the API-returned YAML often show **doubled** backslashes before `\d`, `\[`, and similar tokens.

#### Mixed Multiline Types in One Workload

If one process emits Java stacks, Python tracebacks, and JSON blobs, a single regex rarely fits all first lines. Practical options:

1. **Emit an explicit block header line** — for example a fixed prefix such as `^\d{4}-\d{2}-\d{2}T... \[my-app\] channel=` before each multiline block, then use one `multiline_merge` scoped to that workload.
2. **Split by container** — different `container_name` values with one rule each.
3. **Multiple rules** with disjoint `conditions` — only if each log line can match at most one rule’s condition set for the whole merge window (avoid conditions that only match the first line).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.groundcover.com/use-groundcover/data-pipelines/log-pipelines/merging-multiline-logs.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
