Merging Multiline Logs

Overview

Many applications write multiline logs: stack traces, pretty-printed JSON, or human-readable errors split across several stdout/stderr lines. Kubernetes and most log collectors treat each line as a separate log record by default, which breaks search, correlation, and parsing.

Use the OTTL function multiline_merge in the groundcover logs pipeline to reassemble continuation lines into a single log record whose body contains embedded newlines.

For the broader parsing workflow and other OTTL examples, see Parsing Logs. This page focuses on multiline behavior, common first-line patterns, and how to scope merge rules.

What Are Multiline Logs?

A multiline log is one logical event that spans multiple physical lines. Common cases:

  • Stack traces (Java, Python, Node.js, Go)

  • Pretty-printed or indented JSON

  • System messages where only the first line carries a timestamp

Each line is still one line of text; the issue is ingestion boundaries, not the application binary.

Why Merge Multiline Logs?

Without merging:

  • Stack frames appear as unrelated log rows

  • Only the first line may match time or level parsers

  • JSON split across lines cannot be parsed as a single object

  • Noise increases (many low-value rows per incident)

After multiline_merge, one record represents the full event, so downstream parsing, ExtractGrokPatterns, JSON parsing, and log-based metrics behave as intended.

How multiline_merge Works

multiline_merge takes a first-line regex. Any line matching the pattern starts a new buffered block. Lines that do not match are treated as continuations of the current block until the buffer flushes (see Parsing Logs — multiline_merge).

Important

  • Put multiline_merge first in the rule’s statements list so later statements see the merged body when the buffer flushes.

  • Conditions must include continuation lines — for example scope by workload or namespace, not by a pattern that matches only the first line of a stack trace. See Parsing Logs.

Optional arguments:

Argument
Meaning
Default

first_line_pattern

Regex for the start of a new block

(required)

max_lines

Flush after this many lines in one block

128

max_time

Max wait for continuation (e.g. "800ms")

"800ms"

Common First-Line Patterns

The exact regex depends on how the first line of each logical event is formatted. Below are common anchors. Escape backslashes in YAML strings (e.g. ^\\d{4}).

Java Stack Traces

Often the first line includes a timestamp and level, then the exception:

Before merge (three separate ingested lines)

Example rule (same style as Parsing Logs)

After merge (one record)

If the first line is ISO-8601 with T, tighten the pattern, for example:

  • 'multiline_merge("^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}")'

Python Tracebacks

The block usually starts with Traceback (most recent call last)::

Go Panics

Typical first line: panic: ... or fatal error: ...:

Adjust if your runtime prints a different leader (for example a timestamp prefix on the same line).

Node.js Error Stacks

Often Error: or TypeError: on the first line, then at ... lines:

Validate in the Parsing Playground: some stacks start with UnhandledPromiseRejectionWarning or custom prefixes — prefer the most stable prefix your app emits.

Multiline JSON

If each event starts with { or [ on its own line and continuation lines are indented or additional JSON lines, a simple anchor is the opening brace at column 1:

This can collide with single-line JSON logs that also start with {. Narrow with workload, namespace, or a more specific prefix (for example a known {"level": header).

Timestamp-Prefixed Logs

Any scheme where only new events begin with a timestamp works well:

Tune for comma versus dot milliseconds and timezone tokens.

Bracket-Prefixed Leaders

Some services print a tag line (for example [INFO], [ERROR], === summary) and continue with indented lines that are not timestamp-prefixed. A timestamp-only multiline_merge pattern never treats those tag lines as a new block, so each continuation line becomes its own log row.

Use a first-line pattern that matches those leaders, for example:

Escape backslashes as required by your YAML layer — the pipeline editor and the API-returned YAML often show doubled backslashes before \d, \[, and similar tokens.

Mixed Multiline Types in One Workload

If one process emits Java stacks, Python tracebacks, and JSON blobs, a single regex rarely fits all first lines. Practical options:

  1. Emit an explicit block header line — for example a fixed prefix such as ^\d{4}-\d{2}-\d{2}T... \[my-app\] channel= before each multiline block, then use one multiline_merge scoped to that workload.

  2. Split by container — different container_name values with one rule each.

  3. Multiple rules with disjoint conditions — only if each log line can match at most one rule’s condition set for the whole merge window (avoid conditions that only match the first line).

Last updated