LLM Observability
Overview
What is LLM Observability
LLM Observability is the practice of monitoring, understanding, and analyzing interactions with Large Language Models (LLMs), such as OpenAI or Anthropic models across your systems. It focuses on capturing critical data around LLM prompts, responses, performance, and cost, allowing teams to gain visibility into how LLMs are used, what they return, and how they affect application behavior and user experience.
LLM observability extends beyond traditional metrics and logs. It provides structured traces of LLM API calls, including the prompt content, response content, metadata, latency, and token usage. This enables deeper analysis into behavior patterns, failures, anomalies, and even hallucinations.
How groundcover Provides Out-of-the-Box LLM Observability
groundcover delivers LLM observability with zero instrumentation.
Thanks to its unique eBPF-based sensor, groundcover automatically detects and traces LLM API calls without requiring any SDKs, wrappers, or code changes!
When your workloads interact with providers like OpenAI or Anthropic , groundcover intercepts the encrypted traffic at the kernel level, extracts key data points, and transforms each request into a structured spans and metrics. This includes:
Full prompt and response bodies (configurable redaction and sampling)
Token usage (input, output, total)
Model details, temperature, and parameters
Latency and completion time
Error messages and finish reasons
LLM traces are seamlessly integrated into your existing observability data, alongside workload spans, logs, and metrics, giving you full visibility into every interaction and its impact.
Generative AI Span Structure
groundcover automatically transforms every GenAI API call into a structured span, no instrumentation required.
These spans follow the OpenTelemetry GenAI Semantic Conventions, allowing you to reason about LLM interactions using a consistent schema. This structure enables correlation with other telemetry signals (e.g., container traces, logs, metrics), while unlocking deep insights about LLM behavior.
Each span contains key metadata extracted from the API call, such as which model was used, how many tokens were consumed, and why the generation stopped. This allows for rich filtering, searching, monitoring, and cost analysis, without any manual tagging.
Below is a sample of the attributes captured for each LLM span:
gen_ai.system
The Generative AI product as identified by the client or server instrumentation
openai
gen_ai.request.model
The name of the GenAI model a request is being made to
gpt-4
gen_ai.response.usage.input_tokens
The number of tokens used in the GenAI input (prompt)
100
gen_ai.response.usage.output_tokens
The number of tokens used in the GenAI response (completion)
100
gen_ai.response.usage.total_tokens
The number of tokens used in the GenAI request+response (completion)
200
gen_ai.response.model
The name of the model that generated the response
gpt-4-0613
gen_ai.response.finish_reason
Reason the model stopped generating tokens
stop
; lenght
gen_ai.response.choice_count
The target number of candidate completions to return
3
gen_ai.response.system_fingerprint
A fingerprint to track any eventual change in the Generative AI environment
fp_44709d6fcb
gen_ai.response.tools_used
Number of tools used in API call
2
gen_ai.request.temperature
The temperature setting for the Gen AI request
0.0
gen_ai.error.code
The error code for the response
gen_ai.error.message
A human-readable description of the error
gen_ai.error.type
Describes a class of error the operation ended with
timeout
; java.net.UnknownHostException
; server_certificate_invalid
; 500
gen_ai.operation.name
The name of the operation being performed
chat
; generate_content
; text_completion
gen_ai.request.max_tokens
The maximum number of tokens the model generates for a request.
100
gen_ai.request.message_count
Count of messages in API response
1
gen_ai.request.stream
Boolean flag wether stream was set in request
false
gen_ai.request.system_prompt
Boolean flag wether system prompt was used in request prompts
true
gen_ai.request.tools_used
Boolean flag wether any tools were used in requsts
true
gen_ai.request.top_p
The top_p sampling setting for the GenAI request
1.0
gen_ai.response.message_id
Unique id off message created by server
These attributes are available for every span involving a supported LLM provider. They can be queried in search, used in dashboards, or referenced in monitor conditions.
Generative AI Metrics
groundcover automatically exposes key usage metrics for all detected LLM traffic — with no instrumentation or code changes. These metrics are built directly from the traces generated by your LLM API calls and follow the OpenTelemetry GenAI conventions, while being enriched with deep Kubernetes and workload context.
groundcover_workload_gen_ai_response_usage_input_tokens
Number of tokens sent in prompts, broken down by workload
groundcover_workload_gen_ai_response_usage_output_tokens
Number of tokens generated in completions, broken down by workload
groundcover_workload_gen_ai_response_usage_total_tokens
Total token usage (input + output), broken down by workload
groundcover_gen_ai_response_usage_input_tokens
Number of tokens sent in prompts
groundcover_gen_ai_response_usage_output_tokens
Number of tokens generated in completions
groundcover_gen_ai_response_usage_total_tokens
Total number of token usage
Each of these metrics is enriched with a rich set of labels (keys) such as:
workload
namespace
cluster
gen_ai_request_model
gen_ai_response_model
gen_ai_system
client
server
status_code
and more
Configuration
Sampling Configurations
Obfuscation Configuration
As any application monitoring system, the data collected by groundcover is by nature sensitive and contains payloads of full requests and queries. Raw traces can go a long way in a troubleshooting process, but you can choose to obfuscate their payload.
See Obfuscation Configuration for full details on obfuscation in groundcover.
Obfuscating LLM Request Prompts Configuration
This configuration will obfuscate request prompts, while keeping metadata like model, tokens, etc
httphandler:
obfuscationConfig:
keyValueConfig:
enabled: true
mode: "ObfuscateSpecificValues"
specificKeys:
- "messages"
Obfuscating LLM Response Configuration
This configuration will obfuscate response data, while keeping metadata like model, tokens, etc
httphandler:
obfuscationConfig:
keyValueConfig:
enabled: true
mode: "ObfuscateSpecificValues"
specificKeys:
- "choices"
Supported Providers
We currently support OpenAI Chat Completion API calls, currently on short-term roadmap:
Anthropic Chat Completion API calls
Bedrock APIs
Last updated