Most engineers I see installing Claude Code skills for the first time install too many. They scroll the catalog, see twenty things that look useful, run twenty git clone commands into ~/.claude/skills/, and then watch Claude get worse at routine work because three of those skills now fight for the same trigger phrases. The reality is that the engineering starter stack is small — ten to twelve skills, chosen for non-overlap, covering the work an engineer actually does in a week. Beyond that, you're better off writing one bespoke skill that matches how your team ships than installing two more that nearly fit.
This page is the stack I recommend to engineers joining a team that already runs Claude Code. It is not the catalog's top-100 by quality score. It is editorial: the skills I'd want loaded if I had to do code review, debugging, refactoring, test generation, security passes, doc writing, and deployment checks for the next quarter. For each one I'll give the install command, what it actually does, and the prompt phrasing that triggers it cleanly. Then I'll show how three of them chain into workflows that are noticeably better than vanilla Claude Code, flag the anti-stack of skills that look tempting but degrade discovery, and close with a maintenance routine that keeps the stack from rotting.
Claude Code's skill discovery is YAML-driven. Every skill in ~/.claude/skills/<slug>/SKILL.md declares a name: and a description:. When you write a prompt, Claude scans the descriptions of every loaded skill, ranks them by how well they match your request, and picks the strongest match. This is fast, deterministic, and almost entirely textual — there is no embedding model rescoring you in the background.
The consequence is mechanical: two skills whose descriptions overlap will fight. If you have a code-review skill that triggers on "review this code" and a security-audit skill whose description starts with "audit code for security issues", a prompt like "audit this PR" can route to either one — and the wrong choice cascades through the rest of the conversation. The model doesn't know it picked badly. You'll get a security-flavored answer to a code-review question, or vice versa.
So the goal of the starter stack isn't coverage. It's disjoint coverage. Each skill should own a corner of your engineering work with descriptions distinctive enough that ambiguous prompts route somewhere predictable. Ten skills with clean boundaries beats twenty with fuzzy ones every time.
The other reason to keep it small: Claude reads every SKILL.md body when the skill is invoked, not just the description. Large skills (4-8 KB of body content with anti-trigger sections and examples) are good — they make the model behave more consistently. But loading a dozen 8 KB skills means you've shifted twelve files' worth of context onto the model the moment any of them fires. That's fine. Loading thirty is wasteful, and the model starts confabulating about skills it loaded but shouldn't have routed to.
A practical rule: if you can't, off the top of your head, name what each loaded skill does and when it should fire, you have too many. Prune.
These are ordered roughly by frequency-of-use for a typical full-stack or backend engineer. Adjust for your domain — frontend engineers will swap perf-analysis for an accessibility skill; SREs will lean harder on the runbook side. The install pattern for every entry is the same: cd ~/.claude/skills/ && git clone <repo> <slug> or the ClaudSkills desktop app's one-click install if you have it.
Browse /category/engineering/code-review/ and pick one with explicit severity tiers (blocker / nit / praise) and an anti-trigger section that excludes style-only feedback. Triggers cleanly on "review this PR", "is this code safe to merge", "what would a senior reviewer flag". The dumb version of this skill — "review code for bugs" — overlaps with debug and security and refactor. The good ones lock in on PR-shaped input and produce structured output.
cd ~/.claude/skills/ && git clone <debug-skill-repo> debugTriggers on "this error", "why is this failing", a pasted stack trace. The body should walk the model through reproduce → isolate → diagnose → fix instead of jumping to the first plausible cause. The signal of a good debug skill: it asks you for the minimum repro before suggesting fixes.
The bad version writes happy-path tests for every public function. The good version asks what the function is for and writes the boundary-and-failure cases that actually catch regressions. Trigger phrasing: "write tests for this", "what should I test here".
A skill that refuses to refactor without a stated goal (extract method, reduce complexity, isolate side effects). Without this, asking Claude to "refactor" produces aesthetic rewrites that ship no value. With it, you get diffs anchored to a measurable improvement.
Reads code for injection, auth, secrets, and dependency CVEs. The model is genuinely strong at the first three. The skill's job is to focus its attention on those instead of letting it fan out into vague "consider rate limiting" suggestions.
READMEs, API docs, runbooks. The skill should distinguish between the three — the audience and shape are different. A good one starts every doc draft by asking you who reads this and what action they take after reading.
Commit messages, PR descriptions, branch-naming, rebase walkthroughs. The win here is small but constant. Conventional Commits-style skills are common in the catalog; pick one and stick with it.
Profiler output interpretation, N+1 detection, hot-path identification. Triggers on flamegraph paste or "why is this slow". Most engineers can skip this until they hit a real perf problem, then it becomes critical.
Reads package.json / pyproject.toml / go.mod / Cargo.toml and explains what each transitive pull is doing, flags abandoned packages, suggests pin-vs-range strategy. Underused. Pulls its weight on month-end audits.
Pre-deploy verification — CI status, migrations queued, feature flags, rollback triggers documented. Fires on "about to deploy", "ready to ship", "shipping this in an hour".
Eleventh on the list because it's almost cheating — turns rough notes into a yesterday/today/blockers update. Five seconds of typing for ten minutes of polish. Worth the slot.
The point of installing multiple skills isn't to have more triggers fire — it's to chain them into work products no single skill could produce alone. Here are three that earn the stack.
Run before pushing a branch. Invoke the code-review skill on your diff first, address its blockers. Then ask the test-generation skill to identify untested boundaries in the changed code — not full coverage, just the cases your changes enable that didn't exist before. Then run the doc-writing skill on any public-API surface you touched. Total time: 8-12 minutes. What you get: a PR description that already addresses the obvious review feedback, plus tests for the two edge cases your reviewer would have flagged anyway.
# Three prompts, in order:
review this diff as if you were the most paranoid senior on the team
what boundary cases did this change newly enable that aren't tested
update the README section for any public API I changedYou get paged. Stack trace in hand, you fire the debug skill at the trace. It walks reproduce-and-isolate. Once the cause is clear, you switch to the deployment-checklist skill — but pointed backward: "what was in the deploy that landed 35 minutes ago". Then the security-audit skill on the changed files if the incident pattern suggests anything injection-shaped. The chain is shorter than the pre-PR flow but the value density is higher because you're working under time pressure and the skills give you a structured path through panic.
Once a quarter, point the dependency-management skill at your manifest files. It produces a categorized list — abandoned, security-flagged, major-version-stale, fine. Feed the security-flagged list to the security-audit skill scoped to your call sites for those packages — most CVEs don't affect you because you don't hit the vulnerable code path. The skills together give you a one-afternoon cleanup instead of a one-week migration.
The pattern across all three: the skills don't run in parallel, they run in sequence, and each one's output sharpens the next one's input. That's the whole game. If you find yourself chaining the same skills the same way three times in a month, that chain is a candidate for a custom skill of its own — see the next section.
The catalog is large but it is not your team. There are situations where no existing skill fits and the right move is to author one. Signs:
SKILL.md and let the description trigger on the bare verb../scripts/ship or kctl wrapper saves more time than any catalog skill ever will, because Claude can drive it directly instead of asking you to translate.Writing a skill is less work than you'd guess. The frontmatter is two required fields and a handful of optional ones; the body is markdown. See /learn/writing-a-skill-md-file/ for the structure and the anti-patterns that make skills behave inconsistently.
One piece of advice that matters more than people realize: write a narrow description, not a broad one. A skill that triggers on "review code" will fight every other skill in the stack. A skill that triggers on "review code for our team's microservice conventions" will only fire when you actually want it. The description is the only thing Claude sees before deciding to load the body. Make it specific.
If your skill works internally and you'd be willing to share it, the catalog accepts submissions. The bar for admission is content-derived — anti-trigger discipline, frontmatter completeness, body structure. Skills that capture real working practice from a real team tend to score well because they look nothing like the generic templates that flood every catalog.
These are categories of skill that look like wins on the catalog page and turn out to degrade the rest of the stack once installed. Names omitted because the issue is the shape, not any specific skill.
Descriptions like "comprehensive engineering assistant covering review, debug, refactor, test, doc, and deployment." These exist. They sound efficient. In practice, they fire on almost every engineering prompt and overwrite your purpose-built skills' triggers. If you install one of these, your other engineering skills go quiet — the omnibus catches the prompt first. Skip them unless you're running zero other engineering skills.
A skill that assumes you're doing trunk-based development and writes commits accordingly will create friction every time you're on a feature branch. A skill that assumes you have a separate staging environment will confuse the model on projects that deploy direct from main. The skill is good; it's just for someone else's workflow. Read the body before installing.
Skills that rewrite your code style without asking. Sometimes the description is benign ("clean up code") but the body instructs the model to apply opinionated reformatting on every invocation. These produce noisy diffs that make every code-review skill harder to use.
Skills whose entire body is anti-trigger and risk disclaimers. They look conservative. They mostly produce "I should not give you specific advice on this" responses that you didn't want. Conservatism in a skill is good when it routes you to a more specific tool; it's bad when it's the whole output.
Even if a skill is excellent in isolation, if its description steals routing from a skill you use every day, installing it costs you more than it adds. Test this before committing: load the candidate alongside your existing stack and try ten prompts you typically use. If the new skill picks up prompts that should have routed elsewhere, uninstall.
"Revolutionary AI-powered" anything. The description is the trigger. If the author wrote it for humans to read on a landing page, the model will route it badly because there's no concrete trigger phrase for it to match.
The general filter: read the description with the question "what prompt should fire this and only this." If you can't answer, skip the skill.
It's worth understanding the routing mechanism, because once you do, the install decisions get easier.
When you load Claude Code in a project, every SKILL.md under ~/.claude/skills/ is parsed at startup. The frontmatter's name and description are indexed; the body stays on disk. When you submit a prompt, Claude ranks the loaded skills' descriptions against the prompt's intent and either invokes the strongest match (reading the body into context) or proceeds with the base model if nothing scores high enough.
This means three things for stack design:
The model does not, as of this writing, learn from your routing corrections in-session. If it picks the wrong skill, telling it "use the other one" works for that turn but doesn't change next turn's routing. The fix is at the description level, not the conversation level. If a skill is misrouting, edit its description to be more specific about what it should and shouldn't fire on.
For more on the file format and the routing-friendly description patterns, see /learn/skill-md-frontmatter-reference/.
Skills rot. Repos go stale, descriptions you wrote six months ago no longer match how you actually work, and the catalog grows new entries that supersede ones you installed early. Set a recurring 30-minute maintenance window every quarter.
Open ~/.claude/skills/ and look at the directory listing. For each skill, ask:
git log -1 in the skill directory; if the last commit is over a year old and the skill covers a fast-moving domain (frameworks, security, deployment), check the catalog for a fresher equivalent.For each skill you kept: cd ~/.claude/skills/<slug> && git pull. Diff the body. If the author changed an anti-trigger section or added new examples, take a minute to read it — those changes affect when the skill fires for you.
Over the quarter, when Claude routes a prompt to the wrong skill, write it down. At audit time, look at the list. If two skills consistently fight, one of them is wrongly scoped. Either rewrite its description or drop it.
If you've authored skills for your team, the quarterly window is when you push the edits you've accumulated. The most common one is tightening the description after seeing real triggers in the wild. The second most common is adding an anti-trigger section because the skill fired on something you didn't want it to.
After two or three quarters of this discipline, most engineers settle at 8-14 loaded skills. Some stay at 6, some grow to 20 because they genuinely use a wide surface. There's no correct number — there's only the number where every loaded skill is one you can describe and would miss if it were gone. If you can hit that bar with eight skills, you don't need ten.
The stack above is calibrated for a working engineer on a team. The shape changes meaningfully for adjacent contexts.
Drop the deployment checklist (you know your one deploy target), drop git-hygiene (you're the only one reading your commits), keep everything else. The PR review skill is still useful even when you're reviewing your own work — Claude as a fresh pair of eyes catches things you stopped seeing weeks ago. You'll likely add a product-decision skill or a marketing-copy skill that wouldn't make the engineering stack but matters for indie work.
Add an architecture-decision skill (writes ADRs from a list of trade-offs), a tech-debt categorization skill, and a code-review skill specifically tuned for the kind of feedback you give to mid-level engineers (different from the strict-senior framing). You'll still use the core stack but you'll lean harder on the writing-and-decision side than the implementation side.
Swap test-generation for a runbook-writing skill. Add incident-response and capacity-planning skills. Performance analysis becomes core, not optional. The git-hygiene skill is replaced or augmented by a change-request skill that produces the kind of structured change notice your org needs for production work.
Triage-shaped skills become primary. A skill that drafts responses to GitHub issues, one that classifies feature requests against your roadmap, one that writes contributor-friendly review feedback (different tone from internal review). You'll still want refactor and test-generation but they move down the priority order.
Pick the stack that matches what you'll spend the next quarter doing. If you change roles, change the stack. Skills are cheap to install and cheap to remove — there's no reason to hold a stack that fits the work you used to do six months ago.
If you're just starting and want a default to copy, the ten in this page's main stack are a good first quarter. After that you'll know what you actually use, and the stack becomes yours. That's the whole point of skills as a system — they let you encode your working practices into the model's behavior, and the model gets better at your work, not generic work.
Found a bug or want a topic covered? Email [email protected] or open an issue via GitHub.
SKILL.md files, not affiliated with, endorsed by, or sponsored by Anthropic.