# LLM Observability ## Overview LLM Observability is the practice of monitoring, analyzing, and troubleshooting interactions with Large Language Models (LLMs) across distributed systems. It focuses on capturing data regarding prompt content, response quality, performance latency, and token costs. groundcover provides a unified view of your GenAI traffic by combining two powerful data collection methods: zero-instrumentation eBPF tracing and native OpenTelemetry ingestion. ### eBPF-Based Tracing - Zero Instrumentation groundcover automatically detects and traces LLM API calls without requiring SDKs, wrappers, or code modification. The sensor captures traffic at the kernel level, extracting key data points and transforming requests into structured spans and metrics. This allows for instant visibility into third-party providers without altering application code. This method captures: * **Payloads:** Full prompt and response bodies (supports redaction). * **Usage:** Token counts (input, output, total). * **Metadata:** Model versions, temperature, and parameters. * **Performance:** Latency and completion time. * **Status:** Error messages and finish reasons. {% hint style="info" %} **Requirement:** Out-of-the-box LLM tracing for (OpenAI and Anthropic) is available starting from sensor version **1.9.563**. Bedrock available starting from sensor version **1.11.158** {% endhint %} ### OpenTelemetry Instrumentation Support In addition to auto-detection, groundcover supports the ingestion of traces generated by manual OpenTelemetry instrumentation. If your applications are already instrumented using OpenTelemetry SDKs (e.g., using the OpenTelemetry Python or JavaScript instrumentation for OpenAI/LangChain), groundcover will seamlessly ingest, process, and visualize these spans alongside your other telemetry data. ## Generative AI Span Structure When groundcover captures traffic via eBPF, it automatically transforms the data into structured spans that adhere to the [OpenTelemetry GenAI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/). This standardization allows LLM traces to correlate with existing application telemetry. Below are the attributes captured for each **eBPF-generated** LLM span:

Attribute	Description	Example
`gen_ai.system`	The Generative AI provider	`openai`
`gen_ai.request.model`	The model name requested by the client	`gpt-4`
`gen_ai.response.model`	The name of the model that generated the response	`gpt-4-0613`
`gen_ai.response.usage.input_tokens`	Tokens consumed by the input (prompt)	`100`
`gen_ai.response.usage.output_tokens`	Tokens generated in the response	`100`
`gen_ai.response.usage.total_tokens`	Total token usage for the interaction	`200`
`gen_ai.response.finish_reason`	Reason the model stopped generating	`stop` ; `length`
`gen_ai.response.choice_count`	Target number of candidate completions	`3`
`gen_ai.response.system_fingerprint`	Fingerprint to track backend environment changes	`fp_44709d6fcb`
`gen_ai.response.tools_used`	Number of tools used in API call	`2`
`gen_ai.request.temperature`	The temperature setting	`0.0`
`gen_ai.request.max_tokens`	Maximum tokens allowed for the request	`100`
`gen_ai.request.top_p`	The top_p sampling setting	`1.0`
`gen_ai.request.stream`	Boolean indicating if streaming was enabled	`false`
`gen_ai.response.message_id`	Unique ID of the message created by the server
`gen_ai.error.code`	The error code for the response
`gen_ai.error.message`	A human-readable description of the error
`gen_ai.error.type`	Describes a class of error the operation ended with	`timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500`
`gen_ai.operation.name`	The name of the operation being performed	`chat`; `generate_content`; `text_completion`
`gen_ai.request.message_count`	Count of messages in API response	`1`
`gen_ai.request.system_prompt`	Boolean flag whether system prompt was used in request prompts	`true`
`gen_ai.request.tools_used`	Boolean flag whether any tools were used in requests	`true`

## Generative AI Metrics groundcover automatically generates rate, errors, duration and usage metrics from the LLM traces. These metrics adhere to [OpenTelemetry GenAI conventions](https://opentelemetry.io/docs/specs/semconv/ai/) and are enriched with Kubernetes context (cluster, namespace, workload, etc).

Metric Name	Description
`groundcover_workload_gen_ai_response_usage_input_tokens`	Input token count, aggregated by K8s workload
`groundcover_workload_gen_ai_response_usage_output_tokens`	Output token count, aggregated by K8s workload
`groundcover_workload_gen_ai_response_usage_total_tokens`	Total token usage, aggregated by K8s workload
`groundcover_gen_ai_response_usage_input_tokens`	Global input token count (cluster-wide)
`groundcover_gen_ai_response_usage_output_tokens`	Global output token count (cluster-wide)
`groundcover_gen_ai_response_usage_total_tokens`	Global total token usage (cluster-wide)

**Available Labels:** Metrics can be filtered by: `workload`, `namespace`, `cluster`, `gen_ai_request_model`, `gen_ai_system`, `client`, `server`, and `status_code`. ## Configuration ### Obfuscation Configuration LLM payloads often contain sensitive data (PII, secrets). By default, groundcover collects full payloads to aid in debugging. You can configure the agent to obfuscate specific fields within the prompts or responses using the `httphandler` configuration in your `values.yaml`. See [Sensitive data obfuscation](/~/revisions/wppaVAUQcsAgts3lmC6b/customization/customize-usage/sensitive-data-obfuscation.md) for full details on obfuscation in groundcover. {% hint style="info" %} By default groundcover does **not** obfuscate LLM payloads. {% endhint %} #### Obfuscating Request Prompts This configuration will obfuscate request prompts, while keeping metadata like model, tokens, etc {% code title="values.yaml" %} ```yaml httphandler: obfuscationConfig: keyValueConfig: enabled: true mode: "ObfuscateSpecificValues" specificKeys: - "messages" - "inputText" - "prompt" ``` {% endcode %} #### Obfuscating Response Prompts This configuration will obfuscate response data, while keeping metadata like model, tokens, etc {% code title="values.yaml" %} ```yaml httphandler: obfuscationConfig: keyValueConfig: enabled: true mode: "ObfuscateSpecificValues" specificKeys: - "choices" - "output" - "content" - "outputs" - "results" - "generation" ``` {% endcode %} ### Supported Providers groundcover currently supports the following providers via auto-detection: * OpenAI (Chat Completion API) * Anthropic (Chat Completion API) * AWS Bedrock APIs {% hint style="info" %} For providers not listed above, manual OpenTelemetry instrumentation can be used to send data to groundcover. {% endhint %} --- # Agent Instructions: Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://docs.groundcover.com/~/revisions/wppaVAUQcsAgts3lmC6b/capabilities/llm-observability.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.