Skills and MCP servers are the two halves of every interesting Claude Code workflow. Skills tell Claude what to do and how to think about a problem; MCP servers give Claude the actual hands and eyes to touch your systems. The moment you compose them, you stop having an autocomplete and start having an agent.
This guide walks through the composition model end to end: how a skill declares the tools it is allowed to call, how Claude's loop picks one over another, and three worked examples — Notion, Postgres, and GitHub — that I run in production. We will also cover the gotchas: same-named tools across servers, write-vs-read separation, and the debugging patterns I reach for when Claude refuses to call a tool or, worse, calls the wrong one.
If you have been around Claude Code for more than a week, you have probably heard these three terms used interchangeably. They are not. Getting them straight is the first step to designing a workflow that does not collapse the moment a tool call surprises you.
A skill is a Markdown file — conventionally SKILL.md — that lives under ~/.claude/skills/<slug>/. It carries a YAML frontmatter block (name, description, optionally allowed-tools and others) followed by free-form Markdown the model reads as instructions. Skills are behavioural: they tell Claude when to engage a particular workflow, what conventions to follow, what edge cases to flag. A skill on its own cannot reach out and touch the world. It can only shape how Claude thinks and responds.
An MCP server — Model Context Protocol server — is a separate process that exposes tools Claude can invoke. Each tool has a name (search, execute_sql, create_pull_request_comment), a JSON schema for its arguments, and a handler that does real work: hitting an API, querying a database, mutating a filesystem. MCP servers are capability: they extend what Claude can do, not how it thinks.
A plugin, in the Claude Code sense, is a packaging convention. A plugin can bundle one or more skills, one or more MCP server entries, and the configuration that wires them together — typically under a .claude-plugin/ directory with a plugin.json manifest. Plugins are the distribution unit when you want someone else to install your composition with a single command rather than copying files around.
The pithy summary I have ended up using with teammates is this: skills are instructions, MCP servers are tools, plugins are packaging. When something misfires, naming which of the three is misbehaving will save you an hour of debugging. A skill that never gets activated is a different problem from an MCP server whose tool schema is wrong, which is again different from a plugin whose manifest forgot to register a dependency.
One practical implication: you can mix and match freely. A single skill can reference tools from three different MCP servers. A single MCP server can be used by twenty different skills, each with its own allowed-tools list. And a plugin can ship a skill that has zero MCP dependencies — pure prompt engineering, no capability — or an MCP server with no accompanying skill, leaving the model to figure out when to use it. The composition is yours to design.
Composition happens at two layers: declaration (what the skill says it is allowed to do) and runtime (what Claude's agent loop actually does on a given turn).
At declaration time, a skill optionally lists tools in its frontmatter under allowed-tools. This is the access-control list. If allowed-tools is present, Claude will only invoke tools on that list when this skill is the active behavioural context. If it is absent, the skill inherits the project-wide tool set — typically every tool from every configured MCP server, which is rarely what you actually want.
---
name: summarise-meeting-notes
description: Pull a Notion page and produce a structured action-items summary.
allowed-tools:
- mcp__notion__search
- mcp__notion__fetch_page
- mcp__notion__update_page
---The tool names follow a convention: mcp__<server-name>__<tool-name>. The double underscores are not aesthetic — they are how Claude's tool router disambiguates two MCP servers that happen to expose a tool with the same identifier. We will come back to this in the naming section.
At runtime, Claude's loop is roughly: read the user's message, read the active skills, build a prompt that includes the skill bodies relevant to the request, decide whether to call a tool, call it, observe the result, decide again. Each cycle is one turn. A turn can include zero tool calls (Claude just answers) or many (Claude searches, then fetches, then summarises, then writes).
The decision of which tool to call is mostly textual. Claude reads the skill body and the tool descriptions and picks based on what looks most relevant to the user's request. This is why a skill body that says 'when the user asks for a summary, call mcp__notion__search first, then mcp__notion__fetch_page for each top hit' outperforms one that just says 'use Notion'. The more prescriptive your skill body, the more deterministic the composition becomes.
Two practical patterns I rely on heavily:
Real production skills usually blend both: recipe scaffolding for the predictable parts, philosophy for the branches where the data dictates the next move.
The first composition I deployed in earnest was a meeting-notes summariser. The team kept raw notes in Notion, one page per meeting, and nobody had time to extract action items by hand. A skill plus the Notion MCP server turned this into a single command.
The MCP server side is small: install the Notion MCP server (there are several maintained variants on npm and pypi), register it in your Claude Code config, and authenticate it with a Notion integration token scoped to the right workspace. The server exposes tools roughly along the lines of search, fetch_page, append_block, update_page, create_page.
The skill, summarise-meeting-notes, then looks like this in essence:
---
name: summarise-meeting-notes
description: When the user asks to summarise a Notion meeting, pull the page, extract decisions and action items, and write a structured summary back.
allowed-tools:
- mcp__notion__search
- mcp__notion__fetch_page
- mcp__notion__append_block
---
# Summarise meeting notes
When the user names a meeting, do this:
1. Call `mcp__notion__search` with the meeting name as the query. Filter to pages updated in the last 7 days unless the user names a date.
2. If more than one page returns, ask the user which one before fetching.
3. Call `mcp__notion__fetch_page` on the chosen page id.
4. Produce a summary with three sections: **Decisions**, **Action items** (each with owner and due date if mentioned), **Open questions**.
5. Append the summary to the bottom of the same page via `mcp__notion__append_block`, prefixed with a heading like `## Summary (generated)`.
Never overwrite existing content. Never create a new page unless the user explicitly asks.Two things make this work well in practice. The 'ask before fetching when ambiguous' rule keeps Claude from confidently summarising the wrong meeting — a failure mode I hit twice before adding it. The 'never overwrite' rule is the security backstop: even though update_page is in the MCP server's tool set, it is deliberately not in allowed-tools, so the skill physically cannot blow away a page even if Claude misreads an instruction.
The composition shines on edge cases. A teammate once asked it to summarise a meeting that had no action items, only decisions. Because the skill body specifies the three sections explicitly, Claude wrote a summary with an empty Action items section and a note explaining why — exactly the behaviour I wanted. A philosophy-only skill would have probably hidden the gap and made the summary look complete.
The second composition is the one I lose sleep over getting right: a production database investigator. The Postgres MCP server exposes tools like list_databases, list_tables, describe_table, and execute_sql. That last one is the obvious foot-gun.
The skill, diagnose-slow-query, runs against a read replica I keep specifically for this purpose. The composition is built around the principle that an investigator should never be able to modify the thing it is investigating.
---
name: diagnose-slow-query
description: Given a slow Postgres query, gather plan and stats, identify the likely cause, and suggest an index or rewrite.
allowed-tools:
- mcp__pg_readonly__list_tables
- mcp__pg_readonly__describe_table
- mcp__pg_readonly__execute_sql
---
# Diagnose a slow query
The MCP server `pg_readonly` is connected to a read replica with a role that has `SELECT` only. Do not attempt INSERT, UPDATE, DELETE, ALTER, CREATE, or DROP statements — they will fail and you will look foolish.
When the user pastes a query:
1. Run `EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) <query>` via `mcp__pg_readonly__execute_sql`.
2. Identify the top cost node. If it is a Seq Scan over more than 50k rows, that is your suspect.
3. Pull `pg_stats` for the columns in the WHERE clause via `mcp__pg_readonly__execute_sql`.
4. Pull existing indexes for the table via `mcp__pg_readonly__describe_table`.
5. Suggest either (a) a new index, (b) a query rewrite, or (c) a statistics target bump. Show the exact DDL but **do not execute it**.
Report in this order: query summary, plan summary, suspect node, evidence, suggestion, exact DDL.Notice what is missing: there is no mcp__pg_writable__execute_sql in allowed-tools. I also run a separate MCP server, pg_writable, used by a different skill for migrations. The two are deliberately separate processes with separate database roles. Even if I accidentally name them similarly in conversation, Claude cannot reach across the boundary because the allowed-tools list draws the line.
The phrasing in the skill body matters too. 'Do not attempt INSERT… you will look foolish' is not just snark. Strong, slightly informal language about what not to do has empirically worked better for me than polite hedges. Whether that is a model quirk or just my taste, the skill behaves more reliably when the prohibitions are sharp.
The output structure — query summary, plan summary, suspect node, evidence, suggestion, DDL — is the recipe pattern. Every diagnosis looks the same, which means I can scan ten of them in a row without re-orienting each time. That payoff compounds.
The third composition is the one that has saved my team the most actual time. The GitHub MCP server exposes a wide tool set: get_pull_request, get_pull_request_files, get_file_contents, create_pull_request_review, create_pull_request_comment, list_pull_requests, and many more. A code-review skill pairs naturally with this.
The trick with GitHub is that the MCP tool set is much bigger than what any single skill should be allowed to use. A reviewer skill should be able to read everything and comment, but it should never merge, never close, never approve. That separation is encoded in allowed-tools.
---
name: review-pull-request
description: Review a GitHub PR for correctness, style, and risk. Post comments inline. Never approve or merge.
allowed-tools:
- mcp__github__get_pull_request
- mcp__github__get_pull_request_files
- mcp__github__get_file_contents
- mcp__github__create_pull_request_review_comment
---
# Review a pull request
When the user gives a PR URL or number:
1. Call `mcp__github__get_pull_request` to get the title, body, and diff stats.
2. Call `mcp__github__get_pull_request_files` to get the per-file diff.
3. For each changed file, decide whether you need the surrounding context. If yes, call `mcp__github__get_file_contents` for the head ref.
4. Identify issues in three categories: **Correctness** (bugs, edge cases, race conditions), **Style** (readability, naming, consistency with the rest of the file), **Risk** (security, performance, blast radius if deployed).
5. For each non-trivial issue, post a review comment via `mcp__github__create_pull_request_review_comment` on the specific line. Use the `body` field for the comment text; reference the line by `path` and `line`.
6. Summarise the review at the end in chat. Do not post a top-level review (no `create_pull_request_review` in allowed-tools — that is intentional, because it can carry an APPROVE state).
If the PR is more than 500 lines changed, ask the user to scope you to specific files before diving in.The line-level comments via create_pull_request_review_comment are the difference between a useful reviewer and a noisy one. A summary at the bottom of a PR is easy to ignore; a comment on line 47 of auth.py next to the actual issue is much harder to dismiss.
The 'ask before diving in if >500 lines' rule is there because I once let the skill loose on a 4,000-line refactor and it posted forty-three comments, many of which were variations of the same observation. Scoping the review to the parts that matter produces better commentary and uses fewer tokens.
Note the deliberately excluded tool. create_pull_request_review can attach an APPROVE or REQUEST_CHANGES state to a review. I do not want an automated skill to ever take that action. Inline comments are advisory; review states are gating. Drawing the line at allowed-tools makes the boundary unambiguous.
The mcp__<server>__<tool> naming convention exists because tool names collide more often than you would expect. Two MCP servers — one for Postgres, one for SQLite — might both expose a tool called execute_sql. A Notion MCP and a Slack MCP might both expose search. Without namespacing, Claude has no clean way to disambiguate, and you end up with either accidental routing or a refusal to call anything.
The server name is whatever you registered the MCP server under in your Claude Code config. If you registered the Notion server as notion, its search tool is mcp__notion__search. If you registered it as notion_personal to distinguish it from a notion_work server pointing at a different workspace, the tool becomes mcp__notion_personal__search. The server name is yours to choose, but be deliberate — once skills are written against it, renaming the server is a breaking change.
A practical pattern I use: when I have read-only and writable variants of the same underlying capability, I name them <thing>_readonly and <thing>_writable. So pg_readonly and pg_writable, or fs_readonly and fs_writable. This makes the security boundary visible in every tool reference. A skill body that says mcp__pg_readonly__execute_sql is self-documenting; a skill body that says mcp__pg__execute_sql leaves you guessing.
Same-named tools across servers will both be visible to Claude if both servers are registered and both tools are in scope (either explicitly in allowed-tools or by virtue of no allowed-tools being set). When the skill body just says 'call search', Claude will pick based on context — usually correctly, sometimes not. The fix is to always reference tools by their full namespaced name in the skill body. 'Call mcp__notion__search, not the Slack search' is uglier prose but produces deterministic behaviour.
One subtlety: the MCP server's own tool name (what it exposes over the protocol) might contain dots, slashes, or other characters that Claude Code normalises. Most servers stick to snake_case, which is the path of least surprise. If you are writing your own MCP server, do not get clever with tool names. create_pull_request_comment is good; github.pr.comment.create is asking for routing failures.
A small migration note. If you rename a server in your config from pg to pg_readonly, every skill that referenced mcp__pg__execute_sql now points at a tool that does not exist. Claude will refuse to call it with a tool-not-found error, which surfaces in the chat as a polite shrug. grep -r 'mcp__pg__' ~/.claude/skills/ before any rename. The cost of the search is five seconds; the cost of a silent break is a confused afternoon.
Every composition introduces blast radius. The skill plus MCP server combination above can read your Notion workspace, run SQL against a database, and post comments on GitHub PRs. Each of those is a real action with real consequences if the model gets it wrong.
The single most effective security control is allowed-tools. If a skill does not list a tool, it cannot call that tool, regardless of how the conversation goes. This is a hard enforcement, not a soft preference. Use it on every skill that touches anything mutable.
The second most effective control is splitting MCP servers by privilege level. The pg_readonly / pg_writable pattern from the Postgres example generalises. Run the Notion MCP server with a token scoped to read-only access from one configured entry, and a second instance with write access from another. Run the GitHub MCP server with a token that has repo:read from one entry and repo:write from another. The skill picks its lane by which server it references; you cannot accidentally cross over.
The third control is the skill body itself. Strong language about what not to do — 'never overwrite', 'never approve', 'never run write SQL' — works better than polite hedges. It also helps to name the consequence: 'never approve, because an approval skips human review and can ship code' gives the model a reason to remember the prohibition. Reasons compose; rules do not.
A few patterns I now treat as defaults:
allowed-tools. Adding a write tool is a deliberate decision that triggers a re-read of the skill body.One nuance worth naming. The blast radius is not just the immediate tool call. A skill that has read access to your production database can dump customer data into the chat transcript. A skill that can read GitHub can read private repositories. Even read-only tools carry exfiltration risk if the chat transcript is shared. Plan your composition with the assumption that anything Claude reads might end up in a log somewhere.
Compositions fail in three characteristic ways. Recognising which one you are looking at saves debugging time.
Failure mode 1: Claude refuses to call a tool. The chat says something like 'I do not have access to that tool' or 'the tool you mentioned is not available'. Ninety percent of the time, this is an allowed-tools mismatch. The tool name in the skill frontmatter does not exactly match the registered server-plus-tool name. Common causes: server registered as notion but skill references mcp__notion_api__search; tool name has a typo (fetch_pag instead of fetch_page); skill was copied from another project where servers had different names. Fix: list your actual registered tools (the Claude Code CLI has a command for this; check claude mcp list or your local equivalent) and copy-paste the exact names into allowed-tools.
Failure mode 2: Claude calls the wrong tool. The composition runs, but the result is not what you wanted. The skill says 'search Notion' and Claude searches Slack instead, because both MCP servers expose a search tool and the skill body did not specify which one. Fix: reference tools by their full namespaced name in the skill body. 'Call mcp__notion__search' instead of 'search Notion'. The prose is uglier; the behaviour is deterministic.
Failure mode 3: Claude calls the right tool with the wrong arguments. The composition runs, the tool returns, but the result is nonsense. Usually the skill body is too vague about how to call the tool. 'Use execute_sql to get the plan' is too vague; 'call mcp__pg_readonly__execute_sql with the literal string EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) <query>' is specific enough to be reliable. Tools have schemas, but Claude is constructing the arguments from text; if the text is ambiguous, the arguments will be too.
A diagnostic move I use constantly: ask Claude to list the tools it thinks it has access to as the first turn of a debugging session. The reply is usually accurate and immediately exposes mismatches. If Claude lists mcp__notion__search but your skill references mcp__notion_workspace__search, you have a server-name typo. If Claude lists nothing at all, your MCP servers are not started or not registered. If Claude lists tools you did not expect, your skill is inheriting tools because allowed-tools is missing or empty.
A second diagnostic move: add a logging instruction to the skill body during debugging. 'Before calling each tool, state which tool you are about to call and why.' This makes the agent loop legible. You can see the reasoning chain in the chat output and catch the moment where Claude picks the wrong tool or constructs a bad argument. Remove the instruction once the skill is stable; the production version should be quieter.
If a skill used to work and now does not, the cause is almost always upstream. The MCP server got an update that renamed a tool; the Claude Code config drifted; an authentication token expired. Bisect by reverting the skill to a known-good version and adding pieces back. Composition failures rarely come from the skill text itself if you have not edited the skill text.
Found a bug or want a topic covered? Email [email protected] or open an issue via GitHub.
SKILL.md files, not affiliated with, endorsed by, or sponsored by Anthropic.