# LLM Observability

## Overview

LLM Observability is the practice of monitoring, analyzing, and troubleshooting interactions with Large Language Models (LLMs) across distributed systems. It focuses on capturing data regarding prompt content, response quality, performance latency, and token costs.

groundcover provides a unified view of your GenAI traffic by combining two powerful data collection methods: zero-instrumentation eBPF tracing and native OpenTelemetry ingestion.

### eBPF-Based Tracing - Zero Instrumentation

groundcover automatically detects and traces LLM API calls without requiring SDKs, wrappers, or code modification.

The sensor captures traffic at the kernel level, extracting key data points and transforming requests into structured spans and metrics. This allows for instant visibility into third-party providers without altering application code. This method captures:

* **Payloads:** Full prompt and response bodies (supports redaction).
* **Usage:** Token counts (input, output, total).
* **Metadata:** Model versions, temperature, and parameters.
* **Performance:** Latency and completion time.
* **Status:** Error messages and finish reasons.

{% hint style="info" %}
**Requirement:** Out-of-the-box LLM tracing for (OpenAI and Anthropic) is available starting from sensor version **1.9.563**. Bedrock available starting from sensor version **1.11.158**
{% endhint %}

### OpenTelemetry Instrumentation Support

In addition to auto-detection, groundcover supports the ingestion of traces generated by manual OpenTelemetry instrumentation.

If your applications are already instrumented using OpenTelemetry SDKs (e.g., using the OpenTelemetry Python or JavaScript instrumentation for OpenAI/LangChain), groundcover will seamlessly ingest, process, and visualize these spans alongside your other telemetry data.

## Generative AI Span Structure

When groundcover captures traffic via eBPF, it automatically transforms the data into structured spans that adhere to the [OpenTelemetry GenAI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/).

This standardization allows LLM traces to correlate with existing application telemetry. Below are the attributes captured for each **eBPF-generated** LLM span:

<table><thead><tr><th width="323.15625">Attribute</th><th width="211">Description</th><th>Example</th></tr></thead><tbody><tr><td><code>gen_ai.system</code></td><td>The Generative AI provider</td><td><code>openai</code></td></tr><tr><td><code>gen_ai.request.model</code></td><td>The model name requested by the client</td><td><code>gpt-4</code></td></tr><tr><td><code>gen_ai.response.model</code></td><td>The name of the model that generated the response</td><td><code>gpt-4-0613</code></td></tr><tr><td><code>gen_ai.response.usage.input_tokens</code></td><td>Tokens consumed by the input (prompt)</td><td><code>100</code></td></tr><tr><td><code>gen_ai.response.usage.output_tokens</code></td><td>Tokens generated in the response</td><td><code>100</code></td></tr><tr><td><code>gen_ai.response.usage.total_tokens</code></td><td>Total token usage for the interaction</td><td><code>200</code></td></tr><tr><td><code>gen_ai.response.finish_reason</code></td><td>Reason the model stopped generating</td><td><code>stop</code> ; <code>length</code></td></tr><tr><td><code>gen_ai.response.choice_count</code></td><td>Target number of candidate completions</td><td><code>3</code></td></tr><tr><td><code>gen_ai.response.system_fingerprint</code></td><td>Fingerprint to track backend environment changes</td><td><code>fp_44709d6fcb</code></td></tr><tr><td><code>gen_ai.response.tools_used</code></td><td>Number of tools used in API call</td><td><code>2</code></td></tr><tr><td><code>gen_ai.request.temperature</code></td><td>The temperature setting</td><td><code>0.0</code></td></tr><tr><td><code>gen_ai.request.max_tokens</code></td><td>Maximum tokens allowed for the request</td><td><code>100</code></td></tr><tr><td><code>gen_ai.request.top_p</code></td><td>The top_p sampling setting</td><td><code>1.0</code></td></tr><tr><td><code>gen_ai.request.stream</code></td><td>Boolean indicating if streaming was enabled</td><td><code>false</code></td></tr><tr><td><code>gen_ai.response.message_id</code></td><td>Unique ID of the message created by the server</td><td></td></tr><tr><td><code>gen_ai.error.code</code></td><td>The error code for the response</td><td></td></tr><tr><td><code>gen_ai.error.message</code></td><td>A human-readable description of the error</td><td></td></tr><tr><td><code>gen_ai.error.type</code></td><td>Describes a class of error the operation ended with</td><td><code>timeout</code>; <code>java.net.UnknownHostException</code>; <code>server_certificate_invalid</code>; <code>500</code></td></tr><tr><td><code>gen_ai.operation.name</code></td><td>The name of the operation being performed</td><td><code>chat</code>; <code>generate_content</code>; <code>text_completion</code></td></tr><tr><td><code>gen_ai.request.message_count</code></td><td>Count of messages in API response</td><td><code>1</code></td></tr><tr><td><code>gen_ai.request.system_prompt</code></td><td>Boolean flag whether system prompt was used in request prompts</td><td><code>true</code></td></tr><tr><td><code>gen_ai.request.tools_used</code></td><td>Boolean flag whether any tools were used in requests</td><td><code>true</code></td></tr></tbody></table>

## Generative AI Metrics

groundcover automatically generates rate, errors, duration and usage metrics from the LLM traces. These metrics adhere to [OpenTelemetry GenAI conventions](https://opentelemetry.io/docs/specs/semconv/ai/) and are enriched with Kubernetes context (cluster, namespace, workload, etc).

<table><thead><tr><th width="501.15234375">Metric Name</th><th width="246.54296875">Description</th></tr></thead><tbody><tr><td><code>groundcover_workload_gen_ai_response_usage_input_tokens</code></td><td>Input token count, aggregated by K8s workload</td></tr><tr><td><code>groundcover_workload_gen_ai_response_usage_output_tokens</code></td><td>Output token count, aggregated by K8s workload</td></tr><tr><td><code>groundcover_workload_gen_ai_response_usage_total_tokens</code></td><td>Total token usage, aggregated by K8s workload</td></tr><tr><td><code>groundcover_gen_ai_response_usage_input_tokens</code></td><td>Global input token count (cluster-wide)</td></tr><tr><td><code>groundcover_gen_ai_response_usage_output_tokens</code></td><td>Global output token count (cluster-wide)</td></tr><tr><td><code>groundcover_gen_ai_response_usage_total_tokens</code></td><td>Global total token usage (cluster-wide)</td></tr></tbody></table>

**Available Labels:**

Metrics can be filtered by: `workload`, `namespace`, `cluster`, `gen_ai_request_model`, `gen_ai_system`, `client`, `server`, and `status_code`.

## Configuration

### Obfuscation Configuration

LLM payloads often contain sensitive data (PII, secrets). By default, groundcover collects full payloads to aid in debugging. You can configure the agent to obfuscate specific fields within the prompts or responses using the `httphandler` configuration in your `values.yaml`.

See [Sensitive data obfuscation](/~/revisions/wppaVAUQcsAgts3lmC6b/customization/customize-usage/sensitive-data-obfuscation.md) for full details on obfuscation in groundcover.

{% hint style="info" %}
By default groundcover does **not** obfuscate LLM payloads.
{% endhint %}

#### Obfuscating Request Prompts

This configuration will obfuscate request prompts, while keeping metadata like model, tokens, etc

{% code title="values.yaml" %}

```yaml
httphandler:
  obfuscationConfig:
    keyValueConfig:
      enabled: true
      mode: "ObfuscateSpecificValues"
      specificKeys:
        - "messages"
        - "inputText"
        - "prompt"
```

{% endcode %}

#### Obfuscating Response Prompts

This configuration will obfuscate response data, while keeping metadata like model, tokens, etc

{% code title="values.yaml" %}

```yaml
httphandler:
  obfuscationConfig:
    keyValueConfig:
      enabled: true
      mode: "ObfuscateSpecificValues"
      specificKeys:
        - "choices"
        - "output"
        - "content"
        - "outputs"
        - "results"
        - "generation"
```

{% endcode %}

### Supported Providers

groundcover currently supports the following providers via auto-detection:

* OpenAI (Chat Completion API)
* Anthropic (Chat Completion API)
* AWS Bedrock APIs

{% hint style="info" %}
For providers not listed above, manual OpenTelemetry instrumentation can be used to send data to groundcover.
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.groundcover.com/~/revisions/wppaVAUQcsAgts3lmC6b/capabilities/llm-observability.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
