---
name: forge-mcp-tool-design
description: Designing individual MCP tools that LLMs use correctly on the first try. Naming, descriptions written for model consumption, JSON Schema discipline, error returns that the model can recover from, examples in the schema, idempotency, dry-run flags. Contains worked tool schemas for common patterns. Use when adding a tool to an MCP server, or when designing function-calling tools for any LLM agent.
license: MIT
---

# forge-mcp-tool-design

You are designing a tool whose primary reader is an LLM. The model reads the name, reads the description, infers when to call it from context, fills the arguments from the user's request, parses your response, and decides what to do next. Every word in the schema is in the model's working memory at every decision point.

Default agent-written tools have vague names, multi-purpose descriptions, missing examples, untyped parameters, and free-form text responses that the model has to re-parse. This skill exists to fix all of that.

## Quick reference (the things you must never ship)

1. A tool named `do_thing`, `process_request`, or anything without a verb-object structure.
2. A parameter without a `description`.
3. `type: "string"` for a value that has a closed set of valid options (use `enum`).
4. A tool that does two things ("create_or_update_user").
5. A tool description that starts with "this tool" or "this function."
6. JSON Schema without `additionalProperties: false` on object types.
7. A long-form text response that the model has to re-parse to extract a value.
8. A destructive tool without a `dry_run` flag.
9. A tool that returns 50KB of JSON on a small input (no truncation).
10. A tool that returns `{ "error": "not found" }` as a string instead of as `isError: true` content.

## Hard rules

### Naming

**1. Tool names are verb_object, snake_case, present tense.**

```
GOOD                          BAD
search_users                  userSearch
create_order                  user_create_thing
get_invoice_by_id             do_invoice
list_active_subscriptions     subscriptions
archive_user                  user
```

**2. One verb per tool.** A tool called `create_or_update_user` should be two tools: `create_user` and `update_user`. The model can sequence; it cannot easily decompose.

**3. Disambiguate similar tools with the object scope, not the verb.**

```
GOOD                          BAD
list_active_users             list_users (filter='active')
list_archived_users           list_users (filter='archived')
```

The model picks the right tool from the name alone.

### Descriptions

**4. Description starts with the verb in third person.**

```
GOOD: "Searches the user directory by name or email."
BAD:  "This tool can be used to search..."
BAD:  "Searches for users in the directory."  (no parameter hint)
```

**5. Three-sentence pattern: what + when + when-not.**

```
Searches the user directory by name or email.
Use when the user asks to find a person by partial information.
Do not use for exact ID lookups; use get_user_by_id instead.
```

**6. Mention sibling tools by name.** The model has the full tool list in context. "See also: `get_user_by_id`" reduces wrong-tool calls measurably.

**7. No marketing language.** "Powerful," "comprehensive," "flexible" take up token budget without affecting tool selection.

**8. Description token budget: 50-150 tokens per tool.** Too short, the model misses context. Too long, tool selection slows for every other tool.

### JSON Schema for parameters

**9. Every parameter has a `description`.**

```json
{
  "type": "object",
  "properties": {
    "email": {
      "type": "string",
      "description": "Email of the user to search for. Partial match supported."
    }
  }
}
```

Without the description, the model often passes the wrong shape.

**10. Use `enum` for closed value sets.**

```json
"status": {
  "type": "string",
  "enum": ["active", "archived", "banned"],
  "description": "Current state of the user."
}
```

The model frequently invents values you did not anticipate if you leave it as `type: "string"`.

**11. Mark `required` explicitly.** Optional parameters should genuinely be optional; the model otherwise feels obligated to fill them with junk.

**12. Use `default` instead of `required` where reasonable.**

```json
"limit": {
  "type": "integer",
  "minimum": 1,
  "maximum": 100,
  "default": 20,
  "description": "Max results to return."
}
```

A `limit` with `default: 20` is better than requiring the model to remember to pass a value.

**13. `additionalProperties: false` on object types.** The model sometimes invents extra fields. Reject them at the boundary instead of silently ignoring.

```json
{
  "type": "object",
  "properties": { ... },
  "required": [ ... ],
  "additionalProperties": false
}
```

**14. Date/time formats are ISO 8601, declared explicitly.**

```json
"start_date": {
  "type": "string",
  "format": "date-time",
  "description": "Start of the date range, ISO 8601 (e.g. 2026-05-22T14:30:00Z)."
}
```

Otherwise the model produces a mix of `2026-05-22`, `2026/05/22`, `May 22 2026`, and Unix timestamps.

### Examples in the schema

**15. Include 1-3 examples for non-obvious parameters.** Especially free-text query params, regex patterns, structured filters.

```json
"filter": {
  "type": "string",
  "description": "Filter expression. Examples: 'status:active', 'created_at>2026-01-01', 'email:*@example.com'.",
  "examples": ["status:active", "created_at>2026-01-01"]
}
```

**16. For complex objects, include a full example call in the description.** A 50-token example saves multiple round-trips.

### Return values

**17. Return JSON, not free-form text.** Even if the natural output is prose, wrap it: `{ "answer": "...", "sources": [...] }`. The model can parse JSON deterministically; prose it has to re-interpret.

**18. Successful responses include a stable shape.** Same top-level fields on every successful call. Optional fields present with `null`, not omitted.

```ts
// GOOD - stable shape
return ok({ users: [...], next_cursor: null, has_more: false });

// BAD - shape varies
return rows.length ? ok({ users: rows }) : ok({});
```

**19. Truncate long responses with an explicit marker.**

```json
{
  "data": [...],
  "truncated": true,
  "total_size_bytes": 204800,
  "continuation_token": "..."
}
```

**20. Empty results return `{ "data": [] }`, not `{ "error": "not found" }`.** Empty is a normal outcome, not an error.

### Error returns

**21. Errors return structured JSON with `code`, `message`, and `retryable` flag.** Same shape as your REST API errors.

```ts
return {
  content: [{
    type: "text",
    text: JSON.stringify({
      error: {
        code: "invalid_filter_field",
        message: "Field 'createdAt' is not a valid filter. Valid fields: 'created_at', 'updated_at', 'status'.",
        retryable: false,
      },
    }),
  }],
  isError: true,
};
```

**22. Error messages name the specific problem AND suggest a fix.**

```
BAD:  "Invalid request."
GOOD: "Field 'email' is required for create_user."
```

**23. Distinguish "tool called wrong" from "external system failed."**

| Code | Meaning | Model's action |
| --- | --- | --- |
| `invalid_argument` | The model's fault | Fix call, retry |
| `not_found` | Resource missing | Different lookup or report |
| `permission_denied` | No access | Stop, report to user |
| `external_unavailable` | Upstream down | Retry with backoff |
| `internal_error` | Unknown server bug | Stop, surface |

### Side effects and idempotency

**24. Tools that mutate state announce it in the description.** Phrases like "Creates and persists...", "Permanently deletes..." flag side effects so the model can confirm with the user when appropriate.

**25. Destructive tools support a `dry_run` flag.**

```json
{
  "type": "object",
  "properties": {
    "user_id": { "type": "string", "format": "uuid" },
    "dry_run": {
      "type": "boolean",
      "default": false,
      "description": "If true, returns what would be deleted without actually deleting."
    }
  },
  "required": ["user_id"],
  "additionalProperties": false
}
```

**26. Mutating tools are idempotent where possible.** Use natural keys or accept an `idempotency_key`.

## Common AI-output patterns to reject

| Pattern | Why wrong | Fix |
| --- | --- | --- |
| Tool name `do_X` or `process_Y` | No verb-object | `create_X`, `search_Y` |
| `name: "user"` (noun only) | Looks like an attribute | `get_user`, `list_users`, etc. |
| `create_or_update_user` | Two verbs | Split into two tools |
| Parameter without description | Model invents shape | Description on every property |
| `type: "string"` for status | Model invents enum values | `enum: ["active", ...]` |
| Missing `additionalProperties: false` | Model invents extra fields | Set explicitly |
| Description starts with "this tool" | Filler | "Searches the user directory..." |
| Long prose return wrapping the answer | Model re-parses | JSON wrapper |
| `{ "error": "not found" }` for empty | Misclassifies | `{ "data": [] }` |
| Destructive tool no dry_run | High-risk default | Accept `dry_run: boolean` |

## Worked example: full tool definition

```ts
{
  name: "search_users",
  description:
    "Searches the user directory by name or email. " +
    "Use when the user asks to find a person by partial information. " +
    "Do not use for exact ID lookups; use get_user_by_id instead.",
  inputSchema: {
    type: "object",
    properties: {
      query: {
        type: "string",
        description: "Full-text query against name and email fields. Partial match.",
        minLength: 1,
        maxLength: 200,
        examples: ["anna", "anna@example.com", "Petrova"],
      },
      status: {
        type: "string",
        enum: ["active", "archived", "banned"],
        description: "Filter by user status. Omit to include all statuses.",
      },
      limit: {
        type: "integer",
        minimum: 1,
        maximum: 100,
        default: 20,
        description: "Maximum number of results to return.",
      },
    },
    required: ["query"],
    additionalProperties: false,
  },
}

// Return on success:
//   { "data": [{"id": "...", "name": "...", "email": "...", "status": "active"}],
//     "count": 3, "has_more": false }
//
// Return on no matches:
//   { "data": [], "count": 0, "has_more": false }
//
// Return on bad input (with isError: true):
//   { "error": { "code": "invalid_argument", "message": "...", "retryable": false } }
```

## Worked example: destructive tool with dry_run

```ts
{
  name: "delete_user",
  description:
    "Permanently deletes a user and their associated data. " +
    "Use only after the user has explicitly confirmed. " +
    "Set dry_run=true to preview what would be deleted without actually deleting.",
  inputSchema: {
    type: "object",
    properties: {
      user_id: {
        type: "string",
        format: "uuid",
        description: "ULID of the user to delete.",
      },
      dry_run: {
        type: "boolean",
        default: false,
        description: "If true, returns the affected resources without performing the deletion.",
      },
    },
    required: ["user_id"],
    additionalProperties: false,
  },
}

// dry_run response:
//   { "would_delete": { "user": {...}, "orders": 12, "sessions": 3 }, "dry_run": true }
//
// real response:
//   { "deleted": { "user_id": "..." }, "dry_run": false }
```

## Workflow

When designing a tool:

1. **State the one-sentence purpose.** If you cannot, the tool is too broad.
2. **Write the name and description first, before any code.** Read aloud: would I pick this tool from a list of 15 based on its description alone?
3. **Sketch the JSON Schema. Every field has a description.**
4. **Write 2-3 example calls with expected responses.** Use as test fixtures.
5. **Decide the error vocabulary.** Which codes, and what should the model do for each?
6. **Implement.** Run against a real LLM client. Iterate the description before the code.

## Verification

```bash
bash skills/mcp/forge-mcp-tool-design/verify/check_tool_schema.sh path/to/tools.ts
```

Flags: object schemas without `additionalProperties: false`, primitive types with no `description` anywhere in the file.

## When to skip this skill

- Internal tools used only programmatically (not LLM-called).
- Quick prototype function-calling demos.
- Tools whose contract is fixed by an external API you do not control.

## Related skills

- [`forge-mcp-server`](../forge-mcp-server/SKILL.md) - the server that hosts the tool.
- [`forge-mcp-resource`](../forge-mcp-resource/SKILL.md) - when a tool should be a resource instead.
- [`forge-tool-use`](../../llm/forge-tool-use/SKILL.md) - the client side of tool calling.
- [`forge-api-design`](../../backend/forge-api-design/SKILL.md) - error shape parallels.
