LLM Observability

Overview

What is LLM Observability

LLM Observability is the practice of monitoring, understanding, and analyzing interactions with Large Language Models (LLMs), such as OpenAI or Anthropic models across your systems. It focuses on capturing critical data around LLM prompts, responses, performance, and cost, allowing teams to gain visibility into how LLMs are used, what they return, and how they affect application behavior and user experience.

LLM observability extends beyond traditional metrics and logs. It provides structured traces of LLM API calls, including the prompt content, response content, metadata, latency, and token usage. This enables deeper analysis into behavior patterns, failures, anomalies, and even hallucinations.

How groundcover Provides Out-of-the-Box LLM Observability

groundcover delivers LLM observability with zero instrumentation.

Thanks to its unique eBPF-based sensor, groundcover automatically detects and traces LLM API calls without requiring any SDKs, wrappers, or code changes!

When your workloads interact with providers like OpenAI or Anthropic , groundcover intercepts the encrypted traffic at the kernel level, extracts key data points, and transforms each request into a structured spans and metrics. This includes:

  • Full prompt and response bodies (configurable redaction and sampling)

  • Token usage (input, output, total)

  • Model details, temperature, and parameters

  • Latency and completion time

  • Error messages and finish reasons

LLM traces are seamlessly integrated into your existing observability data, alongside workload spans, logs, and metrics, giving you full visibility into every interaction and its impact.

Out-of-the-box LLM tracing is available starting from sensor version: 1.9.563

Generative AI Span Structure

groundcover automatically transforms every GenAI API call into a structured span, no instrumentation required.

These spans follow the OpenTelemetry GenAI Semantic Conventions, allowing you to reason about LLM interactions using a consistent schema. This structure enables correlation with other telemetry signals (e.g., container traces, logs, metrics), while unlocking deep insights about LLM behavior.

Each span contains key metadata extracted from the API call, such as which model was used, how many tokens were consumed, and why the generation stopped. This allows for rich filtering, searching, monitoring, and cost analysis, without any manual tagging.

Below is a sample of the attributes captured for each LLM span:

Attribute
Description
Example

gen_ai.system

The Generative AI product as identified by the client or server instrumentation

openai

gen_ai.request.model

The name of the GenAI model a request is being made to

gpt-4

gen_ai.response.usage.input_tokens

The number of tokens used in the GenAI input (prompt)

100

gen_ai.response.usage.output_tokens

The number of tokens used in the GenAI response (completion)

100

gen_ai.response.usage.total_tokens

The number of tokens used in the GenAI request+response (completion)

200

gen_ai.response.model

The name of the model that generated the response

gpt-4-0613

gen_ai.response.finish_reason

Reason the model stopped generating tokens

stop ; lenght

gen_ai.response.choice_count

The target number of candidate completions to return

3

gen_ai.response.system_fingerprint

A fingerprint to track any eventual change in the Generative AI environment

fp_44709d6fcb

gen_ai.response.tools_used

Number of tools used in API call

2

gen_ai.request.temperature

The temperature setting for the Gen AI request

0.0

gen_ai.error.code

The error code for the response

gen_ai.error.message

A human-readable description of the error

gen_ai.error.type

Describes a class of error the operation ended with

timeout; java.net.UnknownHostException; server_certificate_invalid; 500

gen_ai.operation.name

The name of the operation being performed

chat; generate_content; text_completion

gen_ai.request.max_tokens

The maximum number of tokens the model generates for a request.

100

gen_ai.request.message_count

Count of messages in API response

1

gen_ai.request.stream

Boolean flag wether stream was set in request

false

gen_ai.request.system_prompt

Boolean flag wether system prompt was used in request prompts

true

gen_ai.request.tools_used

Boolean flag wether any tools were used in requsts

true

gen_ai.request.top_p

The top_p sampling setting for the GenAI request

1.0

gen_ai.response.message_id

Unique id off message created by server

These attributes are available for every span involving a supported LLM provider. They can be queried in search, used in dashboards, or referenced in monitor conditions.

Generative AI Metrics

groundcover automatically exposes key usage metrics for all detected LLM traffic — with no instrumentation or code changes. These metrics are built directly from the traces generated by your LLM API calls and follow the OpenTelemetry GenAI conventions, while being enriched with deep Kubernetes and workload context.

Metric Name
Description

groundcover_workload_gen_ai_response_usage_input_tokens

Number of tokens sent in prompts, broken down by workload

groundcover_workload_gen_ai_response_usage_output_tokens

Number of tokens generated in completions, broken down by workload

groundcover_workload_gen_ai_response_usage_total_tokens

Total token usage (input + output), broken down by workload

groundcover_gen_ai_response_usage_input_tokens

Number of tokens sent in prompts

groundcover_gen_ai_response_usage_output_tokens

Number of tokens generated in completions

groundcover_gen_ai_response_usage_total_tokens

Total number of token usage

Each of these metrics is enriched with a rich set of labels (keys) such as:

  • workload

  • namespace

  • cluster

  • gen_ai_request_model

  • gen_ai_response_model

  • gen_ai_system

  • client

  • server

  • status_code

  • and more

Configuration

Sampling Configurations

Obfuscation Configuration

As any application monitoring system, the data collected by groundcover is by nature sensitive and contains payloads of full requests and queries. Raw traces can go a long way in a troubleshooting process, but you can choose to obfuscate their payload.

See Obfuscation Configuration for full details on obfuscation in groundcover.

By default groundcover does not obfuscate LLM payloads.

Obfuscating LLM Request Prompts Configuration

This configuration will obfuscate request prompts, while keeping metadata like model, tokens, etc

httphandler:
  obfuscationConfig:
    keyValueConfig:
      enabled: true
      mode: "ObfuscateSpecificValues"
      specificKeys:
        - "messages"

Obfuscating LLM Response Configuration

This configuration will obfuscate response data, while keeping metadata like model, tokens, etc

httphandler:
  obfuscationConfig:
    keyValueConfig:
      enabled: true
      mode: "ObfuscateSpecificValues"
      specificKeys:
        - "choices"

Supported Providers

We currently support OpenAI Chat Completion API calls, currently on short-term roadmap:

  • Anthropic Chat Completion API calls

  • Bedrock APIs

Last updated