# Skills

Skills are reusable instructions you write once and the Agent follows whenever they apply. Use them to encode team runbooks, naming conventions, investigation playbooks, or any context you'd otherwise repeat in every prompt.

{% hint style="info" %}
**Skills are user-level, not organization-level.** Each Skill belongs to the user who created it and is only visible and usable in that user's own Agent conversations. Skills are **not** shared across your team, tenant, or organization - if a teammate should have the same Skill, they need to create their own copy. There is no concept of a "team Skill" or "org Skill" today.
{% endhint %}

## Creating a Skill

Open the **Skills** page from the Agent sidebar. Click **New Skill** and fill in:

| Field            | Required | Purpose                                                                                                                   |
| ---------------- | -------- | ------------------------------------------------------------------------------------------------------------------------- |
| **Name**         | Yes      | A short identifier shown in the picker (e.g. `Incident Triage`).                                                          |
| **When to use**  | Yes      | Plain-language description of when this Skill should apply. The Agent reads this to decide whether to activate the Skill. |
| **Description**  | No       | Short summary shown alongside the name in the picker.                                                                     |
| **Instructions** | Yes      | Markdown describing how the Agent should operate when this Skill is active.                                               |

Click **Save**. The Skill is available immediately in your next Agent message.

## Ask the Agent to draft a Skill for you

You don't have to write Skills from scratch. Once the Agent has done useful work in a conversation - an RCA, a triage walk-through, a parsing-rule investigation, a dashboard build - you can ask it to turn that work into a Skill without leaving the chat. The Agent drafts a Name, When to use, and Instructions from the current conversation, and you review and save.

The better your ask, the better the resulting Skill. Thin asks produce thin Skills:

> create a skill for investigating DB-related errors for checkout-service

A better ask spells out the activation trigger, the scope, and which parts of the conversation are the reusable pattern:

> Turn this investigation into a Skill. Call it "Postgres slow-query triage for checkout". Activate it when I ask about checkout-service latency or errors that look database-bound. The Instructions should keep the steps we just did: pull `pg_stat_statements` top queries over the incident window, correlate with the deploy timeline of `checkout-api`, then trace one failing request end-to-end. Default lookback is 1 hour, and always use P95 for latency.

Or:

> Save this conversation as a Skill for Redis authentication failures. Name it "Redis auth troubleshooting". Activate it when Redis clients in `cache-*` namespaces start failing auth after a deploy or config change. Instructions should capture exactly what we just checked - Redis config history, client library version, and TLS cert expiry - in that order. Skip the side tangent about node labels.

The Agent produces a better Skill when you tell it:

* **The activation condition** - which services, namespaces, alert shapes, or user phrasings should trigger it. Vague activation causes the Skill to either miss the prompt or auto-pull on unrelated questions.
* **Which parts of the current conversation are the reusable pattern** vs. incidental details. "Keep the three checks we did, drop the bit about node labels."
* **Conventions that should survive** - your preferred metric (P95 vs P99), default lookback, output format, namespaces to ignore, services to exclude.
* **A Name** - or let the Agent propose one. "Postgres slow-query triage" beats "DB stuff."

The draft is a starting point - edit any field before saving, or tweak the Skill from the Skills page later.

## Using a Skill

There are two ways a Skill gets activated in a conversation:

**Explicit** - type **`/`** in the Agent input. A picker shows your Skills; select one to attach it to your next message.

> `/incident-triage` investigate the alert on checkout-service

**Automatic** - the Agent reads the **When to use** field of all your Skills and activates the relevant ones based on your prompt. For example, if your Skill's When to use says "Use when I ask for incident triage or RCA follow-ups", asking the Agent to investigate an incident will pull it in automatically.

When a Skill is active, its Instructions are injected into the Agent's context alongside the base system prompt. The Agent will mention which Skills it used, and you can see them attached to the message.

## Writing Good Instructions

The Instructions field is Markdown - structure it like a runbook or spec. A few patterns that work well:

**Goals and scope**

```markdown
# Goal
Run a 5-whys root cause analysis for production incidents.

# Scope
Only for workloads in the `prod-*` namespaces.
```

**Step-by-step procedures**

```markdown
# Steps
1. Pull error logs from the last 30 minutes, grouped by pattern.
2. Correlate with deploy events in the same window.
3. Check upstream and downstream services for related errors.
4. Summarize findings with Who / What / Where / When / Why.
```

**Conventions and preferences**

```markdown
# Output format
- Lead with a one-line summary.
- Use a bulleted timeline for the chronology.
- Always include the monitor name and namespace.

# Query style
- Prefer P95 over P99 for latency charts.
- Default lookback is 1 hour unless I say otherwise.
```

### Tips

* **Be specific about when it applies** - the Agent uses *When to use* to decide whether to pull the Skill in. Vague descriptions lead to unexpected activations or missed ones.
* **Write for the Agent, not for a human reader** - imperative instructions ("Always include X", "Never do Y") are followed more reliably than prose.
* **Keep it focused** - one Skill per workflow. Two Skills with overlapping triggers are fine; one giant Skill covering every scenario is harder to maintain.
* **Iterate** - if the Agent doesn't behave as expected, refine the Instructions and try again. Changes take effect on the next message.

## A Full Example: Payments Incident Triage

Here's a complete, copy-ready Skill that codifies how one engineer wants the Agent to handle incidents on their payments service. Use it as a starting point and adapt it to your own stack.

**Name**

```
Payments Incident Triage
```

**When to use**

```
Use when I ask to triage an alert, investigate an incident, or do an RCA on any service in the `payments-*` namespaces.
```

**Description**

```
5-whys triage playbook for the payments platform.
```

**Instructions**

````markdown
# Goal
Triage a production incident on the payments platform end-to-end and produce a written summary I can paste into the incident channel.

# Scope
Only for workloads in `payments-api`, `payments-worker`, `payments-ledger`, and `payments-gateway` namespaces. If the alert is for a different service, say so and stop.

# Procedure
1. **Anchor the timeline.** Find when the alert first fired and when symptoms started in the data. Use the earlier of the two as t=0.
2. **Check the blast radius.** For the affected workload, report:
   - Error rate (last 1h, broken down by endpoint)
   - P95 latency (last 1h vs. the prior 24h baseline)
   - Pod-level restarts, OOMKills, and crashloops in the window
3. **Look for change correlation.** In a ±30 min window around t=0, list:
   - Deploys (image tag changes) for any payments-* workload
   - ConfigMap / Secret updates
   - HPA scaling events
   - Upstream incidents (check `auth-*` and `fraud-*` dependencies)
4. **Trace a bad request.** Pull one representative failing trace and walk through the span tree. Identify the first span that shows the error.
5. **Check the database.** Query latency and error rate on the Postgres calls from `payments-ledger`. Flag any query pattern that crossed 500ms P95.
6. **Summarize** in this exact format:

   ```
   **Impact:** <one line>
   **Likely cause:** <one line, cite the evidence>
   **Timeline:**
   - HH:MM — <event>
   - HH:MM — <event>
   **Suggested next step:** <one action>
   **Evidence:** <links to the groundcover views you used>
   ```

# Conventions
- Always use P95 for latency, never average.
- Default lookback is 1 hour; extend to 6 hours only if the alert is older than that.
- Ignore `payments-staging-*` namespaces.
- If you can't find deploy events, say so explicitly — don't guess.
````

### Why this Skill works

A few things make this Skill reliable in practice:

* **The&#x20;*****When to use*****&#x20;is specific.** It names the namespace prefix and the types of prompts that should trigger it. The Agent uses this field to decide whether to auto-activate the Skill, so vague wording like "use for incidents" would cause it to fire on unrelated services.
* **The scope is bounded.** Explicitly listing the four payments namespaces and telling the Agent to stop if the alert is elsewhere prevents it from applying payments-specific assumptions to, say, an auth service outage.
* **Steps are imperative and ordered.** "Anchor the timeline", "Check the blast radius", "Trace a bad request" — each step is a concrete action the Agent can execute against groundcover data. Prose like "investigate thoroughly" produces inconsistent results.
* **The output format is pinned.** Giving the Agent an exact template (Impact / Likely cause / Timeline / Next step / Evidence) means every triage summary looks the same and is safe to paste into an incident channel without reformatting.
* **Defaults and exceptions are explicit.** "Default lookback is 1 hour", "Ignore staging namespaces", "Don't guess if deploys aren't found" — each of these is a decision the Agent would otherwise make inconsistently from one run to the next.

## Managing Skills

On the Skills page you can:

* **Edit** a Skill - changes apply to new messages immediately.
* **Delete** a Skill - removes it from the picker and stops auto-activation.
* **Search** your Skills by name.

Skills are versioned - when a conversation uses a Skill, the revision in use at send time is what the Agent saw.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.groundcover.com/use-groundcover/agent-mode/skills.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
