LLM Observability
Overview
LLM Observability is the practice of monitoring, analyzing, and troubleshooting interactions with Large Language Models (LLMs) across distributed systems. It focuses on capturing data regarding prompt content, response quality, performance latency, and token costs.
groundcover provides a unified view of your GenAI traffic by combining two powerful data collection methods: zero-instrumentation eBPF tracing and native OpenTelemetry ingestion.
eBPF-Based Tracing - Zero Instrumentation
groundcover automatically detects and traces LLM API calls without requiring SDKs, wrappers, or code modification.
The sensor captures traffic at the kernel level, extracting key data points and transforming requests into structured spans and metrics. This allows for instant visibility into third-party providers without altering application code. This method captures:
Payloads: Full prompt and response bodies (supports redaction).
Usage: Token counts (input, output, total).
Metadata: Model versions, temperature, and parameters.
Performance: Latency and completion time.
Status: Error messages and finish reasons.
OpenTelemetry Instrumentation Support
In addition to auto-detection, groundcover supports the ingestion of traces generated by manual OpenTelemetry instrumentation.
If your applications are already instrumented using OpenTelemetry SDKs (e.g., using the OpenTelemetry Python or JavaScript instrumentation for OpenAI/LangChain), groundcover will seamlessly ingest, process, and visualize these spans alongside your other telemetry data.
Generative AI Span Structure
When groundcover captures traffic via eBPF, it automatically transforms the data into structured spans that adhere to the OpenTelemetry GenAI Semantic Conventions.
This standardization allows LLM traces to correlate with existing application telemetry. Below are the attributes captured for each eBPF-generated LLM span:
gen_ai.system
The Generative AI provider
openai
gen_ai.request.model
The model name requested by the client
gpt-4
gen_ai.response.model
The name of the model that generated the response
gpt-4-0613
gen_ai.response.usage.input_tokens
Tokens consumed by the input (prompt)
100
gen_ai.response.usage.output_tokens
Tokens generated in the response
100
gen_ai.response.usage.total_tokens
Total token usage for the interaction
200
gen_ai.response.finish_reason
Reason the model stopped generating
stop ; length
gen_ai.response.choice_count
Target number of candidate completions
3
gen_ai.response.system_fingerprint
Fingerprint to track backend environment changes
fp_44709d6fcb
gen_ai.response.tools_used
Number of tools used in API call
2
gen_ai.request.temperature
The temperature setting
0.0
gen_ai.request.max_tokens
Maximum tokens allowed for the request
100
gen_ai.request.top_p
The top_p sampling setting
1.0
gen_ai.request.stream
Boolean indicating if streaming was enabled
false
gen_ai.response.message_id
Unique ID of the message created by the server
gen_ai.error.code
The error code for the response
gen_ai.error.message
A human-readable description of the error
gen_ai.error.type
Describes a class of error the operation ended with
timeout; java.net.UnknownHostException; server_certificate_invalid; 500
gen_ai.operation.name
The name of the operation being performed
chat; generate_content; text_completion
gen_ai.request.message_count
Count of messages in API response
1
gen_ai.request.system_prompt
Boolean flag whether system prompt was used in request prompts
true
gen_ai.request.tools_used
Boolean flag whether any tools were used in requests
true
Generative AI Metrics
groundcover automatically generates rate, errors, duration and usage metrics from the LLM traces. These metrics adhere to OpenTelemetry GenAI conventions and are enriched with Kubernetes context (cluster, namespace, workload, etc).
groundcover_workload_gen_ai_response_usage_input_tokens
Input token count, aggregated by K8s workload
groundcover_workload_gen_ai_response_usage_output_tokens
Output token count, aggregated by K8s workload
groundcover_workload_gen_ai_response_usage_total_tokens
Total token usage, aggregated by K8s workload
groundcover_gen_ai_response_usage_input_tokens
Global input token count (cluster-wide)
groundcover_gen_ai_response_usage_output_tokens
Global output token count (cluster-wide)
groundcover_gen_ai_response_usage_total_tokens
Global total token usage (cluster-wide)
Available Labels:
Metrics can be filtered by: workload, namespace, cluster, gen_ai_request_model, gen_ai_system, client, server, and status_code.
Configuration
Obfuscation Configuration
LLM payloads often contain sensitive data (PII, secrets). By default, groundcover collects full payloads to aid in debugging. You can configure the agent to obfuscate specific fields within the prompts or responses using the httphandler configuration in your values.yaml.
See Sensitive data obfuscation for full details on obfuscation in groundcover.
Obfuscating Request Prompts
This configuration will obfuscate request prompts, while keeping metadata like model, tokens, etc
Obfuscating Response Prompts
This configuration will obfuscate response data, while keeping metadata like model, tokens, etc
Supported Providers
groundcover currently supports the following providers via auto-detection:
OpenAI (Chat Completion API)
Anthropic (Chat Completion API)
AWS Bedrock APIs
Last updated
