# Obfuscate Traces

## Obfuscate Traces

**Overview**

Protect sensitive data in your traces by masking or removing it before storage. By integrating data obfuscation directly into your traces pipeline, you maintain privacy and meet compliance requirements while still retaining the necessary operational details.

**Why Obfuscate Traces?**

Traces often contain sensitive information in payloads, headers, and attributes:

* **Personal Identifiable Information (PII)** - emails, names, addresses in request/response bodies
* **Credentials** - API keys, tokens, passwords in headers or payloads
* **Financial data** - credit card numbers, account numbers in payment spans
* **Internal system details** - internal IPs, service tokens in headers

Obfuscating this data helps you:

* **Meet compliance requirements** (GDPR, PCI-DSS, HIPAA, etc.)
* **Protect customer privacy**
* **Reduce security risks** from leaked credentials
* **Maintain audit trails** while removing sensitive details

**Obfuscation Approaches**

There are three approaches to obfuscating sensitive data in traces:

**1. Automatic PII Detection with obfuscate\_pii**

Automatically detect and redact sensitive data, no regex required. This is the **recommended approach** for broad coverage with minimal configuration.

**Best for:** Broad PII protection across many pattern types with zero regex effort

**2. Masking with replace\_pattern**

Replace parts of a string with a masking token (e.g., replacing email characters with asterisks). Use this when you want to preserve the field structure while hiding the sensitive value.

**Best for:** Custom patterns not covered by `obfuscate_pii`, partial masking with capture groups

**3. Removing with delete\_key**

Remove fields that contain sensitive data entirely. Use this when the field is not required for downstream analysis.

**Best for:** API keys, passwords, tokens, unnecessary PII in attributes

**Best Practices**

1. **Apply obfuscation to the right scope** - Target specific workloads, headers, or body fields rather than applying broadly
2. **Be specific with patterns** - Avoid over-matching by using precise regex patterns
3. **Test thoroughly** - Review rules carefully before deploying
4. **Document your rules** - Use clear `ruleName` values to explain what each rule protects
5. **Balance utility and privacy** - Mask data in a way that preserves operational value
6. **Combine approaches** - Use `obfuscate_pii` for broad coverage and `replace_pattern` for custom patterns
7. **Order matters** - Place obfuscation rules after transformation rules so useful fields are extracted first

**Automatic PII Obfuscation**

The `obfuscate_pii` function detects and redacts sensitive data across 16 built-in patterns — without writing any regex. It scans the specified field and replaces any detected PII in-place.

{% hint style="success" %}
`obfuscate_pii` is designed for **zero allocations** when no PII is detected, making it safe for high-throughput pipelines.
{% endhint %}

**Supported Patterns**

| Pattern            | Category          | Example                           | Min Match Length |
| ------------------ | ----------------- | --------------------------------- | ---------------- |
| `email`            | personal\_info    | `user@example.com`                | 6                |
| `credit_card`      | credit\_card      | `4111-1111-1111-1111`             | 13               |
| `ipv4_address`     | network\_info     | `192.168.1.1`                     | 7                |
| `ipv6_address`     | network\_info     | `::1`                             | 3                |
| `mac_address`      | network\_info     | `11:22:33:44:55:66`               | 17               |
| `url`              | network\_info     | `https://example.com/path`        | 10               |
| `jwt`              | auth\_token       | `eyJhbGciOi...`                   | 20               |
| `bearer_token`     | auth\_token       | `Bearer abc123xyz`                | 10               |
| `aws_credential`   | cloud\_credential | `AKIAIOSFODNN7EXAMPLE`            | 20               |
| `azure_credential` | cloud\_credential | `azure_key=ABCDE...`              | 15               |
| `github_token`     | api\_token        | `ghp_xxxx...`                     | 40               |
| `gitlab_token`     | api\_token        | `glpat-xxxx...`                   | 26               |
| `slack_token`      | api\_token        | `xoxb-xxxx...`                    | 15               |
| `google_api_key`   | api\_token        | `AIzaXXXX...`                     | 39               |
| `stripe_key`       | api\_token        | `sk_live_xxxx...`                 | 24               |
| `private_key`      | private\_key      | `-----BEGIN RSA PRIVATE KEY-----` | 50               |

**Usage**

```yaml
obfuscate_pii(field, replacement, patterns)
```

**Arguments:**

| Argument      | Required | Description                                                                                                                                             |
| ------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `field`       | Yes      | The field to scan and obfuscate (e.g. `request_body`)                                                                                                   |
| `replacement` | Yes      | The string used to redact detected PII. A single ASCII character (e.g. `"*"`) enables **mask mode**; a multi-character string uses **replacement mode** |
| `patterns`    | Yes      | Comma-separated list of pattern names to enable                                                                                                         |

{% hint style="info" %}
**Two replacement modes:**

* **Mask mode** (single ASCII character, e.g. `"*"`) — overwrites each byte of the match with the mask char, preserving the original length. Recommended for structured payloads since the surrounding JSON stays byte-aligned.
* **Replacement mode** (multi-character, e.g. `"[R]"`) — replaces each match with the literal string and compacts the buffer (output is shorter than input). Length must be ≤ the shortest enabled pattern's min match length, or compilation fails.

Non-ASCII single runes (e.g. `"•"`) are rejected since mask mode is byte-level.
{% endhint %}

{% hint style="warning" %}
**Behavior change for single-character `replacement`** (`obfuscate_pii` v1.11.814+)

A single ASCII character now selects **mask mode** (length-preserving) instead of replacement mode. If you were previously using `replacement: "*"` and relied on the buffer being compacted (each match collapsed to a single `*`), the output will now be a run of `*`s the same length as the original match.

To keep the **old buffer-shrinking behavior**, use a multi-character ASCII replacement that satisfies the min-match-length constraint (≤ the shortest enabled pattern), for example `"**"` or `"[R]"`.

```yaml
# Before (old behavior: each match collapsed to a single "*")
- 'obfuscate_pii(request_body, "*", "email")'

# After — same string, NEW behavior: each byte of the match overwritten with "*"
- 'obfuscate_pii(request_body, "*", "email")'

# To keep old behavior (multi-character → replacement mode + buffer compaction)
- 'obfuscate_pii(request_body, "**", "email")'
```

The same contract applies to logs — see [Obfuscate Logs](/use-groundcover/data-pipelines/log-pipelines/obfuscate-logs.md) for details.
{% endhint %}

**Mode Comparison: Before / After**

The same input run through each mode produces different outputs. Given this request body:

```json
{"user":"alice@example.com","card":"4111-1111-1111-1111"}
```

**Mask mode** with `"*"` — every PII byte is overwritten in place; quotes, commas, and braces stay at the same byte offsets, so the result is still valid JSON of the same length:

```json
{"user":"*****************","card":"*******************"}
```

**Replacement mode** with `"[R]"` — each match is replaced by the literal string and the buffer compacts (output is shorter):

```json
{"user":"[R]","card":"[R]"}
```

For trace bodies and headers that share storage with attribute strings, prefer mask mode so byte offsets are preserved and JSON parsing downstream is unaffected.

**When PII is detected, `obfuscate_pii` automatically:**

1. Replaces the matched content in the target field
2. Sets `pii_<pattern>_detected = "true"` as a span attribute
3. Sets `is_pii = true` on the span

**Common Use Cases**

**Obfuscate Request Body (mask mode — recommended for JSON payloads)**

Scan request payloads for emails and cloud credentials. Single-character replacement (`"*"`) selects mask mode, so the body's byte length is preserved and any JSON structure stays parseable downstream.

```yaml
ottlRules:
  - ruleName: obfuscate_request_body
    conditions:
      - request_body != nil
    statements:
      - obfuscate_pii(request_body, "*", "email,aws_credential")
```

💡 **What it does:** Scans the request body for emails and AWS credentials, overwriting each matched byte with `*`. Automatically sets `is_pii = true` and detection attributes.

**Obfuscate Response Body (replacement mode example)**

Same idea, but using a multi-character replacement (`"[R]"`) — this is **replacement mode**, so each match collapses to the literal string `[R]` and the buffer compacts. Use this only when you don't need to preserve byte offsets or JSON structure.

```yaml
ottlRules:
  - ruleName: obfuscate_response_body
    conditions:
      - response_body != nil
      - workload == "user-service"
    conditionLogicOperator: "and"
    statements:
      - obfuscate_pii(response_body, "[R]", "email,credit_card,ipv4_address")
```

💡 **What it does:** Scans response bodies from the user service for emails, credit cards, and IP addresses. Each match is replaced with `[R]` and the buffer is compacted (output shorter than input).

**Obfuscate Authorization Headers**

Redact bearer tokens and JWTs in request headers (mask mode).

```yaml
ottlRules:
  - ruleName: obfuscate_auth_header
    conditions:
      - request_headers["authorization"] != nil
    statements:
      - obfuscate_pii(request_headers["authorization"], "*", "bearer_token,jwt")
```

💡 **What it does:** Detects and redacts bearer tokens and JWTs in the Authorization header, byte-for-byte.

**Scan All Request Headers**

Scan every entry in request headers for sensitive data (mask mode).

```yaml
ottlRules:
  - ruleName: "scan_all_headers"
    conditions:
      - request_headers != nil
    statements:
      - obfuscate_pii(request_headers, "*", "bearer_token,jwt,aws_credential")
```

💡 **What it does:** Scans all request header values for tokens and credentials, without needing to specify individual header names.

**Obfuscate Span Attributes**

Redact sensitive values stored in span attributes (mask mode).

```yaml
ottlRules:
  - ruleName: "obfuscate_user_email"
    conditions:
      - attributes["user_email"] != nil
    statements:
      - obfuscate_pii(attributes["user_email"], "*", "email")
```

💡 **What it does:** Detects and redacts email addresses in the `user_email` attribute, preserving the attribute's byte length.

***

**Manual Obfuscation with replace\_pattern**

For custom patterns not covered by `obfuscate_pii`, use `replace_pattern` with regex.

**Mask SSN in Attributes**

```yaml
ottlRules:
  - ruleName: mask_ssn
    conditions:
      - attributes["ssn"] != nil
    statements:
      - set(is_pii, true)
      - replace_pattern(attributes["ssn"], "[0-9]{3}-[0-9]{2}-[0-9]{4}", "XXX-XX-XXXX")
```

💡 **Example:** `123-45-6789` → `XXX-XX-XXXX`

**Redact Passwords in Request Body**

```yaml
ottlRules:
  - ruleName: redact_passwords
    conditions:
      - request_body != nil
      - IsMatch(request_body, "password")
    conditionLogicOperator: "and"
    statements:
      - replace_pattern(request_body, "(\"password\"\\s*:\\s*\")[^\"]+\"", "$1[REDACTED]\"")
```

💡 **Example:** `{"password": "secret123"}` → `{"password": "[REDACTED]"}`

**Redact Entire Response from Sensitive Server**

```yaml
ottlRules:
  - ruleName: redact_secret_server_responses
    conditions:
      - workload == "vault-service"
    statements:
      - set(response_body, "<REDACTED>")
      - set(is_pii, true)
```

💡 **What it does:** Completely replaces the response body for all spans from the vault service.

**Redact Cookie Headers**

```yaml
ottlRules:
  - ruleName: redact_cookies
    statements:
      - set(request_headers["cookie"], "[REDACTED]") where request_headers["cookie"] != nil
      - set(response_headers["set-cookie"], "[REDACTED]") where response_headers["set-cookie"] != nil
```

💡 **What it does:** Removes cookie values from both request and response headers.

**Removing Sensitive Attributes**

Use `delete_key` to completely remove sensitive fields.

```yaml
ottlRules:
  - ruleName: remove_sensitive_attributes
    conditions:
      - workload == "auth-service"
    statements:
      - delete_key(attributes, "api_key")
      - delete_key(attributes, "secret_token")
      - delete_key(attributes, "password_hash")
```

💡 **What it does:** Removes the entire `api_key`, `secret_token`, and `password_hash` attributes from auth service spans.

**Key Functions**

**obfuscate\_pii**

Automatically detects and redacts sensitive data across built-in PII patterns.

**Syntax:**

```yaml
- 'obfuscate_pii(field, "replacement", "pattern1,pattern2,...")'
```

**Supported targets:**

* `request_body` / `response_body` — scan payloads
* `attributes["key"]` — scan a specific attribute
* `attributes` — scan ALL attributes
* `request_headers["key"]` — scan a specific header
* `request_headers` / `response_headers` — scan ALL headers

**Available patterns:** `email`, `credit_card`, `ipv4_address`, `ipv6_address`, `mac_address`, `url`, `jwt`, `bearer_token`, `aws_credential`, `azure_credential`, `github_token`, `gitlab_token`, `slack_token`, `google_api_key`, `stripe_key`, `private_key`

**replace\_pattern**

Replaces text matching a pattern with a replacement string.

**Syntax:**

```yaml
- 'replace_pattern(target_field, "pattern", "replacement")'
```

**With capture groups:**

```yaml
- 'replace_pattern(target_field, "(keep_this)remove_this(keep_this)", "$1****$2")'
```

**delete\_key**

Completely removes a field from the attributes.

**Syntax:**

```yaml
- 'delete_key(attributes, "field_name")'
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.groundcover.com/use-groundcover/data-pipelines/traces-pipeline/obfuscate-traces.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
