# Obfuscate Logs

#### Overview

Protect sensitive data in your logs by masking or removing it before ingestion. By integrating data obfuscation directly into your log processing pipelines, you maintain privacy and meet compliance requirements while still retaining the necessary operational details.

#### Why Obfuscate Logs?

Logs often contain sensitive information that needs to be protected:

* **Personal Identifiable Information (PII)** - emails, names, addresses
* **Credentials** - API keys, tokens, passwords
* **Financial data** - credit card numbers, account numbers
* **Internal system details** - internal IPs, service IDs

Obfuscating this data helps you:

* **Meet compliance requirements** (GDPR, PCI-DSS, HIPAA, etc.)
* **Protect customer privacy**
* **Reduce security risks** from leaked credentials
* **Maintain audit trails** while removing sensitive details

{% hint style="success" %}
Obfuscation happens **in the sensor** before logs are sent to storage, ensuring sensitive data never leaves your cluster.
{% endhint %}

#### Obfuscation Approaches

There are three approaches to obfuscating sensitive data:

**1. Automatic PII Detection with obfuscate\_pii**

Automatically detect and redact sensitive data across 16 built-in PII patterns — no regex required. This is the **recommended approach** for broad coverage with minimal configuration. It also supports built-in **Logs-to-Metrics** for tracking detections.

**Best for:** Broad PII protection across many pattern types with zero regex effort

**2. Masking with replace\_pattern**

Replace parts of a string with a masking token (e.g., replacing email characters with asterisks). Use this when you want to preserve the field structure while hiding the sensitive value.

**Best for:** Custom patterns not covered by `obfuscate_pii`, partial masking with capture groups

**3. Removing with delete\_key**

Remove fields that contain sensitive data entirely. Use this when the field is not required for downstream processing or analysis.

**Best for:** API keys, passwords, tokens, unnecessary PII

#### Best Practices

1. **Apply obfuscation early** - Process at the sensor level before data is stored
2. **Be specific with patterns** - Avoid over-matching by using precise regex patterns
3. **Test thoroughly** - Use the Parsing Playground to verify obfuscation rules
4. **Document your rules** - Use clear `ruleName` values to explain what each rule protects
5. **Balance utility and privacy** - Mask data in a way that preserves operational value
6. **Use conditions wisely** - Only apply obfuscation where necessary to minimize overhead
7. **Combine with dropping** - Consider dropping entire logs containing sensitive data when appropriate
8. **Regular audits** - Periodically review logs to ensure obfuscation is working correctly

#### Automatic PII Obfuscation

The `obfuscate_pii` function detects and redacts sensitive data across 16 built-in patterns — without writing any regex. It runs as a custom OTTL function in the log pipeline, scanning the specified field and replacing any detected PII in-place.

{% hint style="success" %}
`obfuscate_pii` is designed for **zero allocations** when PII is detected, making it safe for high-throughput pipelines.
{% endhint %}

{% hint style="info" %}
`obfuscate_pii` is available from groundcover version **1.11.481** and above.
{% endhint %}

\
**Supported Patterns**

| Pattern            | Category          | Example                           | Min Match Length |
| ------------------ | ----------------- | --------------------------------- | ---------------- |
| `email`            | personal\_info    | `user@example.com`                | 6                |
| `credit_card`      | credit\_card      | `4111-1111-1111-1111`             | 13               |
| `ipv4_address`     | network\_info     | `192.168.1.1`                     | 7                |
| `ipv6_address`     | network\_info     | `::1`                             | 3                |
| `mac_address`      | network\_info     | `11:22:33:44:55:66`               | 17               |
| `url`              | network\_info     | `https://example.com/path`        | 10               |
| `jwt`              | auth\_token       | `eyJhbGciOi...`                   | 20               |
| `bearer_token`     | auth\_token       | `Bearer abc123xyz`                | 10               |
| `aws_credential`   | cloud\_credential | `AKIAIOSFODNN7EXAMPLE`            | 20               |
| `azure_credential` | cloud\_credential | `azure_key=ABCDE...`              | 15               |
| `github_token`     | api\_token        | `ghp_xxxx...`                     | 40               |
| `gitlab_token`     | api\_token        | `glpat-xxxx...`                   | 26               |
| `slack_token`      | api\_token        | `xoxb-xxxx...`                    | 15               |
| `google_api_key`   | api\_token        | `AIzaXXXX...`                     | 39               |
| `stripe_key`       | api\_token        | `sk_live_xxxx...`                 | 24               |
| `private_key`      | private\_key      | `-----BEGIN RSA PRIVATE KEY-----` | 50               |

**Usage**

```yaml
obfuscate_pii(field, replacement, patterns, "pii_detections_count")
```

**Arguments:**

| Argument                 | Required | Description                                                                                                                                             |
| ------------------------ | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `field`                  | Yes      | The field to scan and obfuscate (e.g. `body`)                                                                                                           |
| `replacement`            | Yes      | The string used to redact detected PII. A single ASCII character (e.g. `"*"`) enables **mask mode**; a multi-character string uses **replacement mode** |
| `patterns`               | Yes      | Comma-separated list of pattern names to enable                                                                                                         |
| `"pii_detections_count"` | No       | Enables a Logs-to-Metrics counter that tracks detection counts per pattern                                                                              |

{% hint style="info" %}
**Two replacement modes:**

* **Mask mode** (single ASCII character, e.g. `"*"`) — overwrites each byte of the match with the mask char, preserving the original length. Recommended for JSON logs since attributes that share storage with the body stay byte-aligned.
* **Replacement mode** (multi-character, e.g. `"[R]"`) — replaces each match with the literal string and compacts the buffer (output is shorter than input). Length must be ≤ the shortest enabled pattern's min match length, or compilation fails.

Non-ASCII single runes (e.g. `"•"`) are rejected since mask mode is byte-level.
{% endhint %}

{% hint style="warning" %}
**Behavior change for single-character `replacement`** (`obfuscate_pii` v1.11.814+)

A single ASCII character now selects **mask mode** (length-preserving) instead of replacement mode. If you were previously using `replacement: "*"` and relied on the buffer being compacted (each match collapsed to a single `*`), the output will now be a run of `*`s the same length as the original match.

To keep the **old buffer-shrinking behavior**, use a multi-character ASCII replacement that satisfies the min-match-length constraint (≤ the shortest enabled pattern), for example `"**"` or `"[R]"`.

```yaml
# Before (old behavior: each match collapsed to a single "*")
- 'obfuscate_pii(body, "*", "email")'

# After — same string, NEW behavior: each byte of the match overwritten with "*"
- 'obfuscate_pii(body, "*", "email")'

# To keep old behavior (multi-character → replacement mode + buffer compaction)
- 'obfuscate_pii(body, "**", "email")'
```

{% endhint %}

**Mode Comparison: Before / After**

The same input run through each mode produces different outputs. Given this log line:

```
user=alice@example.com card=4111-1111-1111-1111
```

**Mask mode** with `"*"` — every PII byte is overwritten in place; surrounding text and JSON delimiters keep their byte positions:

```
user=***************** card=*******************
```

**Replacement mode** with `"[R]"` — each match is replaced by the literal string and the line gets shorter:

```
user=[R] card=[R]
```

For JSON-bodied logs, prefer mask mode so quote characters, commas, and braces don't shift:

```json
// input
{"user":"alice@example.com","card":"4111-1111-1111-1111"}
// after obfuscate_pii(body, "*", "email,credit_card") — still valid JSON, same length
{"user":"*****************","card":"*******************"}
```

**Basic Configuration**

Obfuscate emails, credit cards, and AWS credentials without metrics. The example below uses mask mode (`"*"`) — the recommended default for JSON-structured logs:

```yaml
ottlRules:
  - ruleName: "pii_obfuscation"
    statements:
      - 'obfuscate_pii(body, "*", "email,credit_card,aws_credential")'
```

**With Logs-to-Metrics**

Enable detection count metrics so you can track how often PII is detected, broken down by pattern, workload, and namespace:

```yaml
ottlRules:
  - ruleName: "pii_obfuscation"
    statements:
      - 'obfuscate_pii(body, "*", "email,credit_card,aws_credential", "pii_detections_count")'
```

The metric `pii_detections_count` is emitted per detected pattern with the following labels:

| Label                  | Description                                                            |
| ---------------------- | ---------------------------------------------------------------------- |
| `pii_pattern_type`     | The specific pattern detected (e.g. `email`, `credit_card`)            |
| `pii_pattern_category` | The category of the pattern (e.g. `personal_info`, `cloud_credential`) |
| `workload`             | Source workload that emitted the log                                   |
| `namespace`            | Source namespace                                                       |
| `cluster`              | Source cluster name                                                    |
| `env_type`             | Environment type (e.g. `k8s`)                                          |

{% hint style="info" %}
Use these metrics to build dashboards for monitoring PII detections across your services. You can query them with PromQL like any other Logs-to-Metrics counter.
{% endhint %}

**All Patterns Example**

Use the following rule to enable **all 16 supported patterns** with Logs-to-Metrics. This is a ready-to-use configuration you can copy directly into your pipeline:

```yaml
ottlRules:
  - ruleName: "groundcover_obfuscate"
    statements:
      - >-
        obfuscate_pii(body, "*",
        "email,credit_card,ipv4_address,ipv6_address,mac_address,url,jwt,bearer_token,aws_credential,azure_credential,github_token,gitlab_token,slack_token,google_api_key,stripe_key,private_key",
        "pii_detections_count")
```

{% hint style="success" %}
This is the recommended starting point. It covers all PII categories — personal info, credentials, network info, tokens, and private keys — with a single rule and built-in metrics.
{% endhint %}

**Log Attributes**

When PII is detected, `obfuscate_pii` automatically sets attributes on the log record for each matched pattern:

```
pii_<pattern>_detected: "true"
```

The log body is modified in-place, with each detected value replaced by the configured replacement string. A `metadata` field is also set to indicate the rule executed successfully.

**Example:** A log line containing an IP address before and after obfuscation.

Before (raw log line):

```
INFO:     10.0.41.5:55410 - "GET /books/title?title=Pride%20and%20Prejudice HTTP/1.1" 200 OK
```

After — **mask mode** (`obfuscate_pii(body, "*", "ipv4_address")`). The 9-byte IP `10.0.41.5` is overwritten byte-for-byte with `*`, so the rest of the line stays at the same byte offsets:

```json
{
  "body": "INFO:     *********:55410 - \"GET /books/title?title=Pride%20and%20Prejudice HTTP/1.1\" 200 OK",
  "attributes": {
    "pii_ipv4_address_detected": "true"
  },
  "metadata": {
    "OTTL_obfuscate_pii_new": "ok"
  }
}
```

After — **replacement mode** (`obfuscate_pii(body, "***", "ipv4_address")`). The IP is replaced with the literal `***` and the line is compacted (6 bytes shorter):

```json
{
  "body": "INFO:     ***:55410 - \"GET /books/title?title=Pride%20and%20Prejudice HTTP/1.1\" 200 OK",
  "attributes": {
    "pii_ipv4_address_detected": "true"
  },
  "metadata": {
    "OTTL_obfuscate_pii_new": "ok"
  }
}
```

In both cases the attribute `pii_ipv4_address_detected` is set to `"true"` and the metadata confirms the rule ran successfully.

These attributes can be used in downstream rules, filters, or queries to identify logs that originally contained sensitive data.

***

#### Common Use Cases

**Masking Email Addresses**

Preserve email structure while hiding most characters.

```yaml
ottlRules:
  - ruleName: "mask_email"
    conditions:
      - 'attributes["email"] != nil'
    statements:
      - 'replace_pattern(attributes["email"], "(.{2}).+(@.*)", "$1****$2")'
```

💡 **Example:** `user@example.com` → `us****@example.com`

**Obfuscating Credit Card Numbers**

Hide credit card digits while keeping the format recognizable.

```yaml
ottlRules:
  - ruleName: "obfuscate_credit_card"
    conditions:
      - 'container_name == "payment-service"'
    statements:
      - 'replace_pattern(body, "(credit card:)[0-9]{4}-[0-9]{4}-[0-9]{4}-[0-9]{4}", "$1****-****-****-****")'
```

💡 **Example:** `credit card:1234-5678-9012-3456` → `credit card:****-****-****-****`

**Removing API Keys**

Completely remove API key fields from logs.

```yaml
ottlRules:
  - ruleName: "remove_api_key"
    conditions:
      - 'attributes["api_key"] != nil'
    statements:
      - 'delete_key(attributes, "api_key")'
```

💡 **What it does:** Removes the entire `api_key` field from the log attributes.

**Masking IP Addresses**

Partially mask IP addresses for privacy while maintaining usefulness.

```yaml
ottlRules:
  - ruleName: "mask_ip"
    conditions:
      - 'attributes["client_ip"] != nil'
    statements:
      - 'replace_pattern(attributes["client_ip"], "(\\d+\\.\\d+\\.)(\\d+\\.\\d+)", "$1xxx.xxx")'
```

💡 **Example:** `192.168.1.100` → `192.168.xxx.xxx`

**Obfuscating Passwords in Logs**

Remove password values from log messages.

```yaml
ottlRules:
  - ruleName: "remove_passwords"
    conditions:
      - 'IsMatch(body, "password")'
    statements:
      - 'replace_pattern(body, "(password[\"\\s:=]+)([^\\s,}]+)", "$1[REDACTED]")'
```

💡 **Example:** `{"username": "john", "password": "secret123"}` → `{"username": "john", "password": "[REDACTED]"}`

**Masking Social Security Numbers**

Mask SSN while keeping the last 4 digits.

```yaml
ottlRules:
  - ruleName: "mask_ssn"
    conditions:
      - 'workload == "user-service"'
    statements:
      - 'replace_pattern(body, "\\b(\\d{3}-\\d{2}-)(\\d{4})\\b", "***-**-$2")'
```

💡 **Example:** `123-45-6789` → `***-**-6789`

**Removing Authentication Tokens**

Strip bearer tokens and authorization headers.

```yaml
ottlRules:
  - ruleName: "remove_auth_tokens"
    conditions:
      - 'IsMatch(body, "Bearer") or IsMatch(body, "Authorization")'
    statements:
      - 'replace_pattern(body, "(Bearer\\s+)[A-Za-z0-9\\-._~+/]+=*", "$1[REDACTED]")'
      - 'replace_pattern(body, "(Authorization[\"\\s:=]+)[^\\s,}]+", "$1[REDACTED]")'
```

💡 **What it does:** Replaces token values while preserving the field names.

**Comprehensive PII Protection**

Combine multiple obfuscation rules for complete protection.

```yaml
ottlRules:
  - ruleName: "protect_pii"
    conditions:
      - 'workload == "user-service"'
    statements:
      # Mask emails
      - 'replace_pattern(body, "([a-zA-Z0-9._%+-]{2})[a-zA-Z0-9._%+-]+(@[a-zA-Z0-9.-]+)", "$1****$2")'
      # Mask phone numbers
      - 'replace_pattern(body, "(\\+?\\d{1,3}[-.\\s]?)(\\d{3})[-.\\s]?(\\d{3})[-.\\s]?(\\d{4})", "$1***-***-$4")'
      # Remove credit cards
      - 'replace_pattern(body, "\\b\\d{4}[\\s-]?\\d{4}[\\s-]?\\d{4}[\\s-]?\\d{4}\\b", "****-****-****-****")'
```

💡 **What it does:** Applies multiple obfuscation patterns in a single rule.

#### Key Functions

**obfuscate\_pii**

Automatically detects and redacts sensitive data across built-in PII patterns. Supports optional Logs-to-Metrics integration.

**Syntax:**

```yaml
- 'obfuscate_pii(field, "replacement", "pattern1,pattern2,...")'
```

**With metrics:**

```yaml
- 'obfuscate_pii(field, "replacement", "pattern1,pattern2,...", "pii_detections_count")'
```

**Available patterns:** `email`, `credit_card`, `ipv4_address`, `ipv6_address`, `mac_address`, `url`, `jwt`, `bearer_token`, `aws_credential`, `azure_credential`, `github_token`, `gitlab_token`, `slack_token`, `google_api_key`, `stripe_key`, `private_key`

See the Automatic PII Obfuscation section above for full details, supported patterns, and configuration examples.

**replace\_pattern**

Replaces text matching a pattern with a replacement string.

**Syntax:**

```yaml
- 'replace_pattern(target_field, "pattern", "replacement")'
```

**With capture groups:**

```yaml
- 'replace_pattern(target_field, "(keep_this)remove_this(keep_this)", "$1****$2")'
```

**Common patterns:**

* `.+` - Match one or more characters (greedy)
* `.*` - Match zero or more characters (greedy)
* `[A-Za-z0-9]+` - Match alphanumeric characters
* `\\d+` - Match digits
* `\\s+` - Match whitespace
* `[^\\s]+` - Match non-whitespace

**delete\_key**

Completely removes a field from the attributes.

**Syntax:**

```yaml
- 'delete_key(attributes, "field_name")'
```

**Example:**

```yaml
- 'delete_key(attributes, "api_key")'
- 'delete_key(attributes, "password")'
- 'delete_key(cache, "temporary_field")'
```

#### Regular Expression Tips

**Capture Groups**

Use parentheses to capture parts you want to keep:

```yaml
# Pattern with capture groups
"(prefix)(middle)(suffix)"

# Replacement using $1, $2, $3
"$1[MASKED]$3"
```

**Common Regex Patterns**

**Email:**

```yaml
"([a-zA-Z0-9._%+-]{2})[a-zA-Z0-9._%+-]+(@[a-zA-Z0-9.-]+)"
```

**Credit Card:**

```yaml
"\\b\\d{4}[\\s-]?\\d{4}[\\s-]?\\d{4}[\\s-]?\\d{4}\\b"
```

**IP Address:**

```yaml
"\\b\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\b"
```

**Phone Number:**

```yaml
"\\+?\\d{1,3}[-.\\s]?\\d{3}[-.\\s]?\\d{3}[-.\\s]?\\d{4}"
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.groundcover.com/use-groundcover/data-pipelines/log-pipelines/obfuscate-logs.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.