> For the complete documentation index, see [llms.txt](https://docs.groundcover.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.groundcover.com/~/revisions/ETrLpNk6KtHjyaVUTLoE/capabilities/llm-observability.md).

# LLM Observability

## Overview

### What is LLM Observability&#x20;

LLM Observability is the practice of monitoring, understanding, and analyzing interactions with Large Language Models (LLMs), such as OpenAI or Anthropic models across your systems. It focuses on capturing critical data around LLM prompts, responses, performance, and cost, allowing teams to gain visibility into how LLMs are used, what they return, and how they affect application behavior and user experience.

LLM observability extends beyond traditional metrics and logs. It provides structured traces of LLM API calls, including the prompt content, response content, metadata, latency, and token usage. This enables deeper analysis into behavior patterns, failures, anomalies, and even hallucinations.

### How groundcover Provides Out-of-the-Box LLM Observability

groundcover delivers LLM observability with zero instrumentation.

Thanks to its unique eBPF-based sensor, groundcover automatically detects and traces LLM API calls without requiring any SDKs, wrappers, or code changes!

When your workloads interact with providers like OpenAI or Anthropic , groundcover intercepts the encrypted traffic at the kernel level, extracts key data points, and transforms each request into a structured spans and metrics. This includes:

* Full prompt and response bodies (configurable redaction and sampling)
* Token usage (input, output, total)
* Model details, temperature, and parameters
* Latency and completion time
* Error messages and finish reasons

LLM traces are seamlessly integrated into your existing observability data, alongside workload spans, logs, and metrics, giving you full visibility into every interaction and its  impact.

{% hint style="info" %}
Out-of-the-box LLM tracing is available starting from sensor version: 1.9.563
{% endhint %}

## Generative AI Span Structure

groundcover automatically transforms every GenAI API call into a structured span, no instrumentation required.

These spans follow the [OpenTelemetry GenAI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/), allowing you to reason about LLM interactions using a consistent schema. This structure enables correlation with other telemetry signals (e.g., container traces, logs, metrics), while unlocking deep insights about LLM behavior.

Each span contains key metadata extracted from the API call, such as which model was used, how many tokens were consumed, and why the generation stopped. This allows for rich filtering, searching, monitoring, and cost analysis, without any manual tagging.

Below is a sample of the attributes captured for each LLM span:

<table><thead><tr><th width="323.15625">Attribute</th><th width="211">Description</th><th>Example</th></tr></thead><tbody><tr><td><code>gen_ai.system</code></td><td>The Generative AI product as identified by the client or server instrumentation</td><td><code>openai</code></td></tr><tr><td><code>gen_ai.request.model</code></td><td>The name of the GenAI model a request is being made to</td><td><code>gpt-4</code></td></tr><tr><td><code>gen_ai.response.usage.input_tokens</code></td><td>The number of tokens used in the GenAI input (prompt)</td><td><code>100</code></td></tr><tr><td><code>gen_ai.response.usage.output_tokens</code></td><td>The number of tokens used in the GenAI response (completion)</td><td><code>100</code></td></tr><tr><td><code>gen_ai.response.usage.total_tokens</code></td><td>The number of tokens used in the GenAI request+response (completion)</td><td><code>200</code></td></tr><tr><td><code>gen_ai.response.model</code></td><td>The name of the model that generated the response</td><td><code>gpt-4-0613</code></td></tr><tr><td><code>gen_ai.response.finish_reason</code></td><td>Reason the model stopped generating tokens</td><td><code>stop</code> ; <code>lenght</code> </td></tr><tr><td><code>gen_ai.response.choice_count</code></td><td>The target number of candidate completions to return</td><td><code>3</code></td></tr><tr><td><code>gen_ai.response.system_fingerprint</code></td><td>A fingerprint to track any eventual change in the Generative AI environment</td><td><code>fp_44709d6fcb</code></td></tr><tr><td><code>gen_ai.response.tools_used</code></td><td>Number of tools used in API call</td><td><code>2</code></td></tr><tr><td><code>gen_ai.request.temperature</code></td><td>The temperature setting for the Gen AI request</td><td><code>0.0</code></td></tr><tr><td><code>gen_ai.error.code</code></td><td>The error code for the response</td><td></td></tr><tr><td><code>gen_ai.error.message</code></td><td>A human-readable description of the error</td><td></td></tr><tr><td><code>gen_ai.error.type</code></td><td>Describes a class of error the operation ended with</td><td><code>timeout</code>; <code>java.net.UnknownHostException</code>; <code>server_certificate_invalid</code>; <code>500</code></td></tr><tr><td><code>gen_ai.operation.name</code></td><td>The name of the operation being performed</td><td><code>chat</code>; <code>generate_content</code>; <code>text_completion</code></td></tr><tr><td><code>gen_ai.request.max_tokens</code></td><td>The maximum number of tokens the model generates for a request.</td><td><code>100</code></td></tr><tr><td><code>gen_ai.request.message_count</code></td><td>Count of messages in API response</td><td><code>1</code></td></tr><tr><td><code>gen_ai.request.stream</code></td><td>Boolean flag wether stream was set in request</td><td><code>false</code></td></tr><tr><td><code>gen_ai.request.system_prompt</code></td><td>Boolean flag wether system prompt was used in request prompts</td><td><code>true</code></td></tr><tr><td><code>gen_ai.request.tools_used</code></td><td>Boolean flag wether any tools were used in requsts</td><td><code>true</code></td></tr><tr><td><code>gen_ai.request.top_p</code></td><td>The top_p sampling setting for the GenAI request</td><td><code>1.0</code></td></tr><tr><td><code>gen_ai.response.message_id</code></td><td>Unique id off message created by server</td><td></td></tr></tbody></table>

These attributes are available for every span involving a supported LLM provider. They can be queried in search, used in dashboards, or referenced in monitor conditions.

## Generative AI Metrics

groundcover automatically exposes key usage metrics for all detected LLM traffic — with no instrumentation or code changes. These metrics are built directly from the traces generated by your LLM API calls and follow the [OpenTelemetry GenAI conventions](https://opentelemetry.io/docs/specs/semconv/ai/), while being enriched with deep Kubernetes and workload context.

<table><thead><tr><th width="501.15234375">Metric Name</th><th width="246.54296875">Description</th></tr></thead><tbody><tr><td><code>groundcover_workload_gen_ai_response_usage_input_tokens</code></td><td>Number of tokens sent in prompts, broken down by workload</td></tr><tr><td><code>groundcover_workload_gen_ai_response_usage_output_tokens</code></td><td>Number of tokens generated in completions, broken down by workload</td></tr><tr><td><code>groundcover_workload_gen_ai_response_usage_total_tokens</code></td><td>Total token usage (input + output), broken down by workload</td></tr><tr><td><code>groundcover_gen_ai_response_usage_input_tokens</code></td><td>Number of tokens sent in prompts</td></tr><tr><td><code>groundcover_gen_ai_response_usage_output_tokens</code></td><td>Number of tokens generated in completions</td></tr><tr><td><code>groundcover_gen_ai_response_usage_total_tokens</code></td><td>Total number of token usage</td></tr></tbody></table>

Each of these metrics is enriched with a rich set of labels (keys) such as:

* `workload`&#x20;
* `namespace`
* `cluster`
* `gen_ai_request_model`
* `gen_ai_response_model`
* `gen_ai_system`
* `client`
* `server`
* `status_code`
* and more

## Configuration

### Sampling Configurations

### Obfuscation Configuration

As any application monitoring system, the data collected by groundcover is by nature sensitive and contains payloads of full requests and queries. Raw traces can go a long way in a troubleshooting process, but you can choose to obfuscate their payload.

See [#obfuscation-configuration](#obfuscation-configuration "mention") for full details on obfuscation in groundcover.

{% hint style="info" %}
By default groundcover does **not** obfuscate LLM payloads.
{% endhint %}

#### Obfuscating LLM Request Prompts Configuration

This configuration will obfuscate request prompts, while keeping metadata like model, tokens, etc

```
httphandler:
  obfuscationConfig:
    keyValueConfig:
      enabled: true
      mode: "ObfuscateSpecificValues"
      specificKeys:
        - "messages"
```

#### Obfuscating LLM Response Configuration

This configuration will obfuscate response data, while keeping metadata like model, tokens, etc

```
httphandler:
  obfuscationConfig:
    keyValueConfig:
      enabled: true
      mode: "ObfuscateSpecificValues"
      specificKeys:
        - "choices"
```

## Supported Providers

We currently support OpenAI Chat Completion API calls, currently on short-term roadmap:

* Anthropic Chat Completion API calls
* Bedrock APIs


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.groundcover.com/~/revisions/ETrLpNk6KtHjyaVUTLoE/capabilities/llm-observability.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
