Data work has a peculiar shape: half of it is writing SQL or Python that anyone could write given an afternoon, and the other half is the judgement calls that decide whether the query answers the right question, whether the migration backfills safely, whether the experiment readout overclaims. That second half is where well-built Claude Code skills earn their keep.
This page is a curated set of skills I reach for across modelling, transformation, quality, analysis, and ML workflows, along with the three end-to-end loops I run them in. I'll also flag which picks matter most for governance, and when you should stop installing other people's skills and write your own.
A Claude Code skill is a Markdown file (called SKILL.md) with YAML frontmatter that lives in ~/.claude/skills/. When Claude Code decides a user's request matches the skill's description, it loads the skill into context and follows the instructions. That's the whole mechanism. There's no daemon, no subprocess, no extra runtime — just a Markdown file that becomes part of the prompt at the right moment.
For data teams, this shape is interesting because so much of our craft is procedural knowledge that's tedious to type out but easy to encode. "When the user asks for a window function, prefer QUALIFY over a subquery if the warehouse supports it." "Before any ALTER TABLE on a production table, generate the rollback statement first." "When summarising an A/B test, lead with the practical significance, not the p-value." These are the kinds of rules every senior data person carries in their head and bleeds out into review comments. Skills let you package them once and have Claude apply them consistently.
The mistake I see new teams make is installing a hundred skills and hoping breadth fixes everything. It doesn't. Skills compete for attention, and a poorly-scoped skill with a vague description field will fire on requests it shouldn't and miss the ones it should. The picks below are ones I've found earn their slot — meaning their description is tight enough that Claude triggers them at the right moment, and the body teaches something specific rather than restating common knowledge.
One more framing note: I split picks into five buckets that mirror how data teams actually divide work. Modelling (SQL, schema design), transformation (dbt and similar), quality (tests, contracts, expectations), analysis (ad-hoc queries, experiments, readouts), and ML/MLOps (training pipelines, notebook conversion, monitoring). A skill that tries to be all five usually does none of them well. Look for picks whose scope is narrow enough that a one-sentence description captures it cleanly.
Throughout this page I'll reference patterns by their conceptual shape rather than naming specific repos, because the catalog shifts week to week. Browse the live categories or the editorial best-of selection to find the current strongest implementations of each pattern.
These are the skills I install first on any data-team workstation. They cover the bread-and-butter of writing and reviewing SQL across warehouses with subtly different dialects.
Look for a skill that takes a warehouse target (BigQuery, Snowflake, Redshift, DuckDB, ClickHouse, Postgres) in its frontmatter or accepts it as an argument, and writes SQL using the idioms that warehouse rewards. BigQuery wants QUALIFY and array functions; Snowflake wants MATCH_RECOGNIZE where it fits; ClickHouse wants arrayJoin instead of UNNEST. A good skill calls these out rather than producing lowest-common-denominator ANSI SQL.
The complement to the writer: takes a query and flags performance traps. Look for one that knows about SELECT * in materialised CTEs, missing partition filters on partitioned tables, predicate pushdown failures across UNION ALL branches, and cardinality estimation errors from NOT IN on nullable columns. The good ones also flag OR joins as a red flag worth converting to UNION.
For any non-trivial ALTER TABLE on a production table, a planning skill should output: the forward statement, the rollback statement, an estimate of whether the operation is metadata-only or rewrites the table, and a backfill strategy if the change adds a column with computed defaults. The skills I trust the most also force you to specify a lock-acquisition timeout.
Window functions are where most analysts stall. A focused skill that takes a query and explains the partition boundaries, the frame clause behaviour, and what happens at ties is worth its weight. Bonus if it suggests QUALIFY rewrites.
If your team builds dimensional models, look for a skill that reviews fact and dimension tables for surrogate-key discipline, slowly-changing-dimension type, additivity of measures, and grain documentation. The catch with this category is that many available skills are too prescriptive about Kimball orthodoxy. Pick one that names the trade-off rather than just citing the rule.
The way I evaluate any of these before installing: read the SKILL.md body and check whether the examples use realistic table names and warehouse dialects, or whether they're toy foo/bar queries. Toy examples in the body usually predict toy behaviour in production.
dbt is the de facto standard for SQL-based transformation, and the surface area for skill-able tasks is huge. These picks cover the lifecycle.
Takes a target table name, source(s), and a model layer (staging, intermediate, mart) and produces the SQL file, the YAML config block, the source declaration if needed, and the documentation stub. The good ones honour your project's existing naming conventions by reading dbt_project.yml rather than imposing their own. The mediocre ones assume stg_/int_/fct_/dim_ and lose to teams with different prefixes.
The dbt ecosystem has dbt-project-evaluator and sqlfluff, but a skill on top of these is still valuable because it can interpret the findings and prioritise. "You have 47 lint failures; here are the 6 that actually affect run correctness, here are the 31 that are stylistic but worth a bulk PR, and here are the 10 false positives because of your project's macros." That triage is the labour you're saving.
Generates not_null, unique, relationships, and accepted_values tests for a model based on its column types and what it can infer from upstream sources. The skill that earns its slot also writes custom singular tests for the trickier invariants — things like "this column is monotonically increasing within each user_id partition" — rather than only generating the four built-ins.
Incremental models are the highest-leverage and most-mistake-prone part of dbt. A skill that takes a model definition and recommends a strategy (append, merge, delete+insert, insert_overwrite) based on the warehouse, the natural key, and whether late-arriving data is expected, is the kind of thing every team rebuilds badly themselves. The good versions explain the trade-off, not just the recommendation.
Reads the project's schema.yml files and flags columns lacking descriptions, models lacking owners, and exposures lacking links to downstream BI artefacts. Documentation rot is the biggest tax on dbt projects past about 200 models, and a skill that surfaces the gaps weekly is cheap and effective.
For all five of these, browse the engineering and tools categories to find current implementations.
Data quality is where a skill's description field matters most, because the requests come in many shapes ("why are revenue numbers off?", "add a check for X", "validate this CSV") and the skill needs to trigger on the right ones.
If your team uses or is moving toward data contracts (formal schemas published by upstream producers), look for a skill that authors them in your chosen format — protobuf, avro, jsonschema, or one of the contract-specific DSLs like datacontract.com's spec. The skill should also produce the matching dbt source declaration and a freshness expectation.
If you're on Great Expectations or Soda, a skill that takes a profile of a dataset (column names, sample values, observed distributions) and produces a starter expectation suite is significantly faster than hand-writing them. The trap is over-checking — a generated suite with 200 expectations that all fire on noise is worse than 20 hand-picked ones. The skills worth installing tier their suggestions and call out which checks are likely to be flaky.
This is the "why is revenue down 12% today" pattern. A skill that takes a metric definition, a baseline window, and a comparison window, and produces a structured diagnostic walk — segmenting by dimension to find the cohort driving the change, checking upstream data freshness, looking for known issues in the surrounding logs — encodes the playbook every senior analyst runs from memory.
One specific thing to look for: quality skills should have a description field that names both the failure mode and the warehouse or framework. "Generates Great Expectations suites for Snowflake tables" is a good description because Claude can match a request like "add some checks to this table" to it. "Helps with data quality" is too vague — it'll fire on every quality-adjacent request and you'll get inconsistent behaviour. When you install a quality skill, read its description and if it's vague, either edit it locally to tighten the scope or skip the skill.
Anomaly diagnosis is the one I'd recommend installing even on small teams. The number of times you'll be paged for a metric anomaly far exceeds the number of times you'll author a contract from scratch.
This bucket is where skills shift from code-generation to thinking-companion. The picks here help with the parts of analysis that are mostly judgement.
Takes a hypothesis, the metric to move, baseline rates, and minimum detectable effect, and produces a power calculation, a sample-size estimate, a randomisation-unit recommendation, and a list of pre-specified analyses. The good ones force you to write the decision rule before launch — "if the lift is greater than X% and the lower bound of the 95% CI is above zero, we ship" — rather than letting it emerge from the data later.
The complement: takes results and produces a readout in the format your team uses. Beyond the headline lift and CI, look for a skill that automatically computes the practical-significance threshold, runs the pre-specified secondary analyses, and flags p-hacking risks like multiple comparisons across many segments.
This is more pattern than skill, but worth calling out: a skill that takes a Jupyter or Quarto notebook and produces a prose readout — what question was asked, what was found, what the caveats are, what to do next — is the highest-leverage analysis skill I install on every analyst machine. The catch is that bad versions will inflate uncertain findings into confident claims. The good versions preserve hedging language and lead with the methods caveats.
For the inverse — taking a vague stakeholder question and turning it into a precise one before any SQL is written — look for a skill that produces a structured framing: the decision the answer informs, the metric definition, the population, the time window, and what the answer would change. This is the cheapest skill on the page in terms of tokens and the highest-leverage in terms of quality of analysis downstream.
If you're not sure where to start in this bucket, the data-analysis use case page bundles the strongest current picks across analysis and reporting.
ML skills cluster around two patterns: helping with the offline modelling loop, and helping move work from notebook to production. Both are useful.
Takes a notebook, identifies the cells that compute features, the cells that train, and the cells that score, and produces a production-ready module structure. The good versions are opinionated about putting feature logic behind named functions, separating training from scoring, and writing the boilerplate to load and save models from your team's model registry. The mediocre versions just concatenate cells into a .py file and call it done.
If your team uses Feast, Tecton, or a homegrown feature store, a skill that takes a SQL or Python feature definition and produces the corresponding feature-store entry with the right entity declarations, TTLs, and online/offline configuration saves real time. Look for one that knows your specific feature store; generic ones produce code you'll spend longer editing than writing from scratch.
Takes a target framework (PyTorch Lightning, Hugging Face Trainer, scikit-learn) and a problem statement, and produces the data loader, model class, training loop with logging, and evaluation harness. The skills worth installing build in MLflow or Weights & Biases logging from the start rather than as an afterthought.
For governance and for your future self, a skill that authors model cards (intended use, limitations, training data summary, evaluation results) and dataset documentation (collection method, known biases, schema, licence) using one of the standard templates is essential at any organisation that's been audited or expects to be. The good ones extract what they can from your training code automatically and only ask you for what genuinely requires judgement.
Once a model is in production, monitoring shifts from "is the pipeline running" to "is the model behaving." A skill that takes a model spec and produces drift checks (input feature drift, prediction drift, label drift when labels eventually arrive) and alerting thresholds is harder to find done well but worth the search. Many available versions encode rules that are too aggressive and produce alert fatigue within a week.
If you're starting an ML platform from scratch, the documentation and monitoring picks have the highest leverage. The training scaffolders are nice but generally easier to live without than you'd expect — model code is often boilerplate that a one-time setup commit handles.
Skills compose. Here are three loops I run repeatedly, calling out which skill bucket fires at each step. Read these as recipes, not prescriptions — your team's flavour will differ.
A stakeholder asks "are paying customers churning faster this quarter than last?" You hand it to Claude Code.
Marketing wants to track a new event property. The events table needs a column.
The growth team wants to test a checkout-page change.
The discipline of pre-registering the analysis is the biggest single quality lever in experiment work. A skill that enforces it pays for itself the first time someone tries to add a segment cut after seeing the results.
If you work with personal data, regulated data, or anything subject to internal data-handling policy, a few of the picks above matter disproportionately and a few new ones are worth adding.
The data contract authoring and model card picks are the two highest-leverage governance skills. Contracts make breaking schema changes visible at PR time rather than at incident time. Model cards force documentation of intended use and limitations before a model ships, which is the artefact regulators and auditors ask for first.
Add a skill that scans SQL queries, dbt models, and notebooks for references to PII columns (your team's email, phone_number, address, etc.) and flags them when they appear in destinations that shouldn't see them — public-facing dashboards, third-party tool destinations, exports. The good versions read a configurable list of sensitive column patterns from a file in your repo, so the rule set lives in version control alongside the code.
A skill that takes a production table schema and produces a synthetic dataset with the same statistical properties but no real values, for use in development and testing. The mediocre ones produce uniform random data that won't catch realistic distribution-sensitive bugs. The good ones preserve marginals and at least some correlations.
For GDPR / CCPA / similar regimes: a skill that takes a user identifier and produces the queries to extract the user's data across your team's tables, formatted for delivery. The skill itself should be careful about not executing the queries automatically and should remind the operator to log the request.
For any governance-adjacent skill, read the SKILL.md body before installing. Skills that mention specific compliance frameworks (HIPAA, GDPR, SOC 2) should treat them with appropriate hedging — no skill can make your code "HIPAA compliant", only your processes and controls can. Skills that promise compliance are usually overselling. Skills that help you produce the artefacts compliance reviewers ask for are doing the right thing.
One specific anti-pattern to avoid: skills that automatically redact or transform what they consider sensitive data without telling you. The right default for any privacy-adjacent skill is to flag and ask, not to silently modify.
Every team I've watched go through skills adoption hits the same moment: after installing twenty or thirty, they realise the marginal skill they're considering is shaped slightly wrong for their context. The frontmatter convention doesn't match their dbt project. The naming doesn't match their warehouse. The opinionated examples assume a stack they don't run. At that point, the right move is to write your own.
You know it's time when you find yourself editing the SKILL.md body after installing it, three times in a row, to make it match how your team actually works. At that point you're better off writing a skill from scratch that encodes your team's specific conventions — your table prefixes, your dialect, your review checklist, your readout template, your incident-response playbook.
The highest-leverage in-house skills are usually the ones that capture decisions your team has already made and is enforcing in code review. "Always include a backfill plan for new dbt models that depend on event data." "All A/B test readouts use the team's standard template." "PRs touching production tables require a rollback statement in the description." These are rules your senior team already enforces; a skill makes the enforcement consistent and removes the bottleneck of needing the senior team to review every PR.
A skill that does one thing well, with a tight description field and 200-400 lines of body, is usually better than an ambitious skill that tries to be a framework. The body should read like a senior engineer's handover note, not like a reference manual. Concrete examples beat abstract rules. If you can't think of a real query or migration to put in the example, the skill probably shouldn't exist yet.
If your team's in-house skill captures a pattern other data teams would benefit from — and most do — consider publishing it. The catalog grows by community contribution, and the skills that get the most use are usually the ones written by practitioners solving real problems on real systems. The format for publishing is straightforward and documented at /install/.
If you want a primer on the file format itself before authoring, the writing a SKILL.md file guide covers the frontmatter fields, the body structure, and the conventions that make skills trigger reliably. For a sense of what "good" looks like, the curated best-of selection is the cleanest reference set.
Found a bug or want a topic covered? Email [email protected] or open an issue via GitHub.
SKILL.md files, not affiliated with, endorsed by, or sponsored by Anthropic.