LLM Observability

Overview

LLM Observability is the practice of monitoring, analyzing, and troubleshooting interactions with Large Language Models (LLMs) across distributed systems. It focuses on capturing data regarding prompt content, response quality, performance latency, and token costs.

groundcover provides a unified view of your GenAI traffic by combining two powerful data collection methods: zero-instrumentation eBPF tracing and native OpenTelemetry ingestion.

eBPF-Based Tracing - Zero Instrumentation

groundcover automatically detects and traces LLM API calls without requiring SDKs, wrappers, or code modification.

The sensor captures traffic at the kernel level, extracting key data points and transforming requests into structured spans and metrics. This allows for instant visibility into third-party providers without altering application code. This method captures:

  • Payloads: Full prompt and response bodies (supports redaction).

  • Usage: Token counts (input, output, total).

  • Metadata: Model versions, temperature, and parameters.

  • Performance: Latency and completion time.

  • Status: Error messages and finish reasons.

Requirement: Out-of-the-box LLM tracing for (OpenAI and Anthropic) is available starting from sensor version 1.9.563. Bedrock available starting from sensor version 1.11.158

OpenTelemetry Instrumentation Support

In addition to auto-detection, groundcover supports the ingestion of traces generated by manual OpenTelemetry instrumentation.

If your applications are already instrumented using OpenTelemetry SDKs (e.g., using the OpenTelemetry Python or JavaScript instrumentation for OpenAI/LangChain), groundcover will seamlessly ingest, process, and visualize these spans alongside your other telemetry data.

Generative AI Span Structure

When groundcover captures traffic via eBPF, it automatically transforms the data into structured spans that adhere to the OpenTelemetry GenAI Semantic Conventions.

This standardization allows LLM traces to correlate with existing application telemetry. Below are the attributes captured for each eBPF-generated LLM span:

Attribute
Description
Example

gen_ai.system

The Generative AI provider

openai

gen_ai.request.model

The model name requested by the client

gpt-4

gen_ai.response.model

The name of the model that generated the response

gpt-4-0613

gen_ai.response.usage.input_tokens

Tokens consumed by the input (prompt)

100

gen_ai.response.usage.output_tokens

Tokens generated in the response

100

gen_ai.response.usage.total_tokens

Total token usage for the interaction

200

gen_ai.response.finish_reason

Reason the model stopped generating

stop ; length

gen_ai.response.choice_count

Target number of candidate completions

3

gen_ai.response.system_fingerprint

Fingerprint to track backend environment changes

fp_44709d6fcb

gen_ai.response.tools_used

Number of tools used in API call

2

gen_ai.request.temperature

The temperature setting

0.0

gen_ai.request.max_tokens

Maximum tokens allowed for the request

100

gen_ai.request.top_p

The top_p sampling setting

1.0

gen_ai.request.stream

Boolean indicating if streaming was enabled

false

gen_ai.response.message_id

Unique ID of the message created by the server

gen_ai.error.code

The error code for the response

gen_ai.error.message

A human-readable description of the error

gen_ai.error.type

Describes a class of error the operation ended with

timeout; java.net.UnknownHostException; server_certificate_invalid; 500

gen_ai.operation.name

The name of the operation being performed

chat; generate_content; text_completion

gen_ai.request.message_count

Count of messages in API response

1

gen_ai.request.system_prompt

Boolean flag whether system prompt was used in request prompts

true

gen_ai.request.tools_used

Boolean flag whether any tools were used in requests

true

Generative AI Metrics

groundcover automatically generates rate, errors, duration and usage metrics from the LLM traces. These metrics adhere to OpenTelemetry GenAI conventions and are enriched with Kubernetes context (cluster, namespace, workload, etc).

Metric Name
Description

groundcover_workload_gen_ai_response_usage_input_tokens

Input token count, aggregated by K8s workload

groundcover_workload_gen_ai_response_usage_output_tokens

Output token count, aggregated by K8s workload

groundcover_workload_gen_ai_response_usage_total_tokens

Total token usage, aggregated by K8s workload

groundcover_gen_ai_response_usage_input_tokens

Global input token count (cluster-wide)

groundcover_gen_ai_response_usage_output_tokens

Global output token count (cluster-wide)

groundcover_gen_ai_response_usage_total_tokens

Global total token usage (cluster-wide)

Available Labels:

Metrics can be filtered by: workload, namespace, cluster, gen_ai_request_model, gen_ai_system, client, server, and status_code.

Configuration

Obfuscation Configuration

LLM payloads often contain sensitive data (PII, secrets). By default, groundcover collects full payloads to aid in debugging. You can configure the agent to obfuscate specific fields within the prompts or responses using the httphandler configuration in your values.yaml.

See Sensitive data obfuscation for full details on obfuscation in groundcover.

By default groundcover does not obfuscate LLM payloads.

Obfuscating Request Prompts

This configuration will obfuscate request prompts, while keeping metadata like model, tokens, etc

Obfuscating Response Prompts

This configuration will obfuscate response data, while keeping metadata like model, tokens, etc

Supported Providers

groundcover currently supports the following providers via auto-detection:

  • OpenAI (Chat Completion API)

  • Anthropic (Chat Completion API)

  • AWS Bedrock APIs

For providers not listed above, manual OpenTelemetry instrumentation can be used to send data to groundcover.

Last updated