How many skills should a data engineer install?

Most teams I see plateau between 15 and 30 active skills. Beyond that, you hit diminishing returns and trigger collisions where multiple skills compete for the same request. Start with the picks in this guide that match your actual workflow and add others only when you notice a repeating manual task that none of your installed skills cover.

Will Claude Code skills work with my warehouse — Snowflake, BigQuery, Redshift, ClickHouse?

Yes, but warehouse-specific skills are far more useful than generic SQL ones. Look for skills whose description names your warehouse explicitly, or that take a warehouse target as an argument. Generic ANSI-SQL skills produce queries that run but don't use the idioms your warehouse rewards.

Can skills run queries against my data, or do they only generate code?

Skills are instructions for Claude — they don't execute anything by themselves. Whether a query runs depends on which tools Claude Code has access to in your environment, which you control via the allowed-tools mechanism. A skill can suggest running a query; you decide whether the agent actually has shell or database access to do it.

Do data quality skills replace tools like Great Expectations or dbt tests?

No, they complement them. A skill helps you author the expectations or tests faster and more consistently, but the underlying framework still runs the checks. Think of skills as authoring assistance, not as a replacement for the data-quality runtime.

What about Jupyter notebooks — can skills help inside a notebook?

Skills fire inside Claude Code sessions, which most analysts run alongside their notebook rather than inside it. The pattern I see work best is using Claude Code in a terminal pane next to the notebook, asking it for SQL or analysis logic, and pasting the result. Notebook-to-readout skills are particularly good at this loop.

How do I keep team-wide skills in sync across analyst machines?

Check your team's skills into a Git repository and have each analyst symlink or rsync the repo into their ~/.claude/skills/ directory. Some teams script this in their onboarding setup. The skills themselves are plain Markdown, so version control handles them cleanly.

Are there skills specifically for ML experiment tracking with MLflow or Weights & Biases?

Yes — look for training-pipeline scaffolder skills that explicitly mention your tracking tool in the description. Generic training skills will often add logging as an afterthought, while ones that target a specific tracker bake it into the structure from the start, which is materially better.

Home › Learn › Claude Code Skills for Data Teams: A Practitioner's Picks

Claude Code Skills for Data Teams: A Practitioner's Picks

Published 1 June 2026 · 13 min read · By a long-time Claude Code practitioner

Data work has a peculiar shape: half of it is writing SQL or Python that anyone could write given an afternoon, and the other half is the judgement calls that decide whether the query answers the right question, whether the migration backfills safely, whether the experiment readout overclaims. That second half is where well-built Claude Code skills earn their keep.

This page is a curated set of skills I reach for across modelling, transformation, quality, analysis, and ML workflows, along with the three end-to-end loops I run them in. I'll also flag which picks matter most for governance, and when you should stop installing other people's skills and write your own.

In this guide

Framing: where skills actually help data work
SQL writing, review, and modelling picks
Transformation and dbt project hygiene
Data quality, contracts, and tests
Analysis, experiments, and readouts
ML ops and notebook-to-production
Three end-to-end workflows
Privacy and governance considerations
When to stop installing and write your own

Framing: where skills actually help data work

A Claude Code skill is a Markdown file (called SKILL.md) with YAML frontmatter that lives in ~/.claude/skills/. When Claude Code decides a user's request matches the skill's description, it loads the skill into context and follows the instructions. That's the whole mechanism. There's no daemon, no subprocess, no extra runtime — just a Markdown file that becomes part of the prompt at the right moment.

For data teams, this shape is interesting because so much of our craft is procedural knowledge that's tedious to type out but easy to encode. "When the user asks for a window function, prefer QUALIFY over a subquery if the warehouse supports it." "Before any ALTER TABLE on a production table, generate the rollback statement first." "When summarising an A/B test, lead with the practical significance, not the p-value." These are the kinds of rules every senior data person carries in their head and bleeds out into review comments. Skills let you package them once and have Claude apply them consistently.

The mistake I see new teams make is installing a hundred skills and hoping breadth fixes everything. It doesn't. Skills compete for attention, and a poorly-scoped skill with a vague description field will fire on requests it shouldn't and miss the ones it should. The picks below are ones I've found earn their slot — meaning their description is tight enough that Claude triggers them at the right moment, and the body teaches something specific rather than restating common knowledge.

One more framing note: I split picks into five buckets that mirror how data teams actually divide work. Modelling (SQL, schema design), transformation (dbt and similar), quality (tests, contracts, expectations), analysis (ad-hoc queries, experiments, readouts), and ML/MLOps (training pipelines, notebook conversion, monitoring). A skill that tries to be all five usually does none of them well. Look for picks whose scope is narrow enough that a one-sentence description captures it cleanly.

Throughout this page I'll reference patterns by their conceptual shape rather than naming specific repos, because the catalog shifts week to week. Browse the live categories or the editorial best-of selection to find the current strongest implementations of each pattern.

SQL writing, review, and modelling picks

These are the skills I install first on any data-team workstation. They cover the bread-and-butter of writing and reviewing SQL across warehouses with subtly different dialects.

1. Dialect-aware SQL writer

Look for a skill that takes a warehouse target (BigQuery, Snowflake, Redshift, DuckDB, ClickHouse, Postgres) in its frontmatter or accepts it as an argument, and writes SQL using the idioms that warehouse rewards. BigQuery wants QUALIFY and array functions; Snowflake wants MATCH_RECOGNIZE where it fits; ClickHouse wants arrayJoin instead of UNNEST. A good skill calls these out rather than producing lowest-common-denominator ANSI SQL.

2. SQL review skill

The complement to the writer: takes a query and flags performance traps. Look for one that knows about SELECT * in materialised CTEs, missing partition filters on partitioned tables, predicate pushdown failures across UNION ALL branches, and cardinality estimation errors from NOT IN on nullable columns. The good ones also flag OR joins as a red flag worth converting to UNION.

3. Schema migration planner

For any non-trivial ALTER TABLE on a production table, a planning skill should output: the forward statement, the rollback statement, an estimate of whether the operation is metadata-only or rewrites the table, and a backfill strategy if the change adds a column with computed defaults. The skills I trust the most also force you to specify a lock-acquisition timeout.

4. Window function explainer

Window functions are where most analysts stall. A focused skill that takes a query and explains the partition boundaries, the frame clause behaviour, and what happens at ties is worth its weight. Bonus if it suggests QUALIFY rewrites.

5. Star schema reviewer

If your team builds dimensional models, look for a skill that reviews fact and dimension tables for surrogate-key discipline, slowly-changing-dimension type, additivity of measures, and grain documentation. The catch with this category is that many available skills are too prescriptive about Kimball orthodoxy. Pick one that names the trade-off rather than just citing the rule.

The way I evaluate any of these before installing: read the SKILL.md body and check whether the examples use realistic table names and warehouse dialects, or whether they're toy foo/bar queries. Toy examples in the body usually predict toy behaviour in production.

Transformation and dbt project hygiene

dbt is the de facto standard for SQL-based transformation, and the surface area for skill-able tasks is huge. These picks cover the lifecycle.

6. dbt model scaffolder

Takes a target table name, source(s), and a model layer (staging, intermediate, mart) and produces the SQL file, the YAML config block, the source declaration if needed, and the documentation stub. The good ones honour your project's existing naming conventions by reading dbt_project.yml rather than imposing their own. The mediocre ones assume stg_/int_/fct_/dim_ and lose to teams with different prefixes.

7. dbt project linter

The dbt ecosystem has dbt-project-evaluator and sqlfluff, but a skill on top of these is still valuable because it can interpret the findings and prioritise. "You have 47 lint failures; here are the 6 that actually affect run correctness, here are the 31 that are stylistic but worth a bulk PR, and here are the 10 false positives because of your project's macros." That triage is the labour you're saving.

8. dbt test writer

Generates not_null, unique, relationships, and accepted_values tests for a model based on its column types and what it can infer from upstream sources. The skill that earns its slot also writes custom singular tests for the trickier invariants — things like "this column is monotonically increasing within each user_id partition" — rather than only generating the four built-ins.

9. dbt incremental strategy advisor

Incremental models are the highest-leverage and most-mistake-prone part of dbt. A skill that takes a model definition and recommends a strategy (append, merge, delete+insert, insert_overwrite) based on the warehouse, the natural key, and whether late-arriving data is expected, is the kind of thing every team rebuilds badly themselves. The good versions explain the trade-off, not just the recommendation.

10. dbt docs gap-filler

Reads the project's schema.yml files and flags columns lacking descriptions, models lacking owners, and exposures lacking links to downstream BI artefacts. Documentation rot is the biggest tax on dbt projects past about 200 models, and a skill that surfaces the gaps weekly is cheap and effective.

For all five of these, browse the engineering and tools categories to find current implementations.

Data quality, contracts, and tests

Data quality is where a skill's description field matters most, because the requests come in many shapes ("why are revenue numbers off?", "add a check for X", "validate this CSV") and the skill needs to trigger on the right ones.

11. Data contract authoring

If your team uses or is moving toward data contracts (formal schemas published by upstream producers), look for a skill that authors them in your chosen format — protobuf, avro, jsonschema, or one of the contract-specific DSLs like datacontract.com's spec. The skill should also produce the matching dbt source declaration and a freshness expectation.

12. Great Expectations / Soda check writer

If you're on Great Expectations or Soda, a skill that takes a profile of a dataset (column names, sample values, observed distributions) and produces a starter expectation suite is significantly faster than hand-writing them. The trap is over-checking — a generated suite with 200 expectations that all fire on noise is worse than 20 hand-picked ones. The skills worth installing tier their suggestions and call out which checks are likely to be flaky.

13. Anomaly diagnosis skill

This is the "why is revenue down 12% today" pattern. A skill that takes a metric definition, a baseline window, and a comparison window, and produces a structured diagnostic walk — segmenting by dimension to find the cohort driving the change, checking upstream data freshness, looking for known issues in the surrounding logs — encodes the playbook every senior analyst runs from memory.

Quality skills and the trigger problem

One specific thing to look for: quality skills should have a description field that names both the failure mode and the warehouse or framework. "Generates Great Expectations suites for Snowflake tables" is a good description because Claude can match a request like "add some checks to this table" to it. "Helps with data quality" is too vague — it'll fire on every quality-adjacent request and you'll get inconsistent behaviour. When you install a quality skill, read its description and if it's vague, either edit it locally to tighten the scope or skip the skill.

Anomaly diagnosis is the one I'd recommend installing even on small teams. The number of times you'll be paged for a metric anomaly far exceeds the number of times you'll author a contract from scratch.

Analysis, experiments, and readouts

This bucket is where skills shift from code-generation to thinking-companion. The picks here help with the parts of analysis that are mostly judgement.

14. Experiment design skill

Takes a hypothesis, the metric to move, baseline rates, and minimum detectable effect, and produces a power calculation, a sample-size estimate, a randomisation-unit recommendation, and a list of pre-specified analyses. The good ones force you to write the decision rule before launch — "if the lift is greater than X% and the lower bound of the 95% CI is above zero, we ship" — rather than letting it emerge from the data later.

15. A/B test analysis skill

The complement: takes results and produces a readout in the format your team uses. Beyond the headline lift and CI, look for a skill that automatically computes the practical-significance threshold, runs the pre-specified secondary analyses, and flags p-hacking risks like multiple comparisons across many segments.

Notebook-to-readout writer

This is more pattern than skill, but worth calling out: a skill that takes a Jupyter or Quarto notebook and produces a prose readout — what question was asked, what was found, what the caveats are, what to do next — is the highest-leverage analysis skill I install on every analyst machine. The catch is that bad versions will inflate uncertain findings into confident claims. The good versions preserve hedging language and lead with the methods caveats.

Ad-hoc question framer

For the inverse — taking a vague stakeholder question and turning it into a precise one before any SQL is written — look for a skill that produces a structured framing: the decision the answer informs, the metric definition, the population, the time window, and what the answer would change. This is the cheapest skill on the page in terms of tokens and the highest-leverage in terms of quality of analysis downstream.

If you're not sure where to start in this bucket, the data-analysis use case page bundles the strongest current picks across analysis and reporting.

ML ops and notebook-to-production

ML skills cluster around two patterns: helping with the offline modelling loop, and helping move work from notebook to production. Both are useful.

Notebook-to-production converter

Takes a notebook, identifies the cells that compute features, the cells that train, and the cells that score, and produces a production-ready module structure. The good versions are opinionated about putting feature logic behind named functions, separating training from scoring, and writing the boilerplate to load and save models from your team's model registry. The mediocre versions just concatenate cells into a .py file and call it done.

Feature store entry author

If your team uses Feast, Tecton, or a homegrown feature store, a skill that takes a SQL or Python feature definition and produces the corresponding feature-store entry with the right entity declarations, TTLs, and online/offline configuration saves real time. Look for one that knows your specific feature store; generic ones produce code you'll spend longer editing than writing from scratch.

Training pipeline scaffolder

Takes a target framework (PyTorch Lightning, Hugging Face Trainer, scikit-learn) and a problem statement, and produces the data loader, model class, training loop with logging, and evaluation harness. The skills worth installing build in MLflow or Weights & Biases logging from the start rather than as an afterthought.

Model card and dataset documentation

For governance and for your future self, a skill that authors model cards (intended use, limitations, training data summary, evaluation results) and dataset documentation (collection method, known biases, schema, licence) using one of the standard templates is essential at any organisation that's been audited or expects to be. The good ones extract what they can from your training code automatically and only ask you for what genuinely requires judgement.

ML monitoring rule writer

Once a model is in production, monitoring shifts from "is the pipeline running" to "is the model behaving." A skill that takes a model spec and produces drift checks (input feature drift, prediction drift, label drift when labels eventually arrive) and alerting thresholds is harder to find done well but worth the search. Many available versions encode rules that are too aggressive and produce alert fatigue within a week.

If you're starting an ML platform from scratch, the documentation and monitoring picks have the highest leverage. The training scaffolders are nice but generally easier to live without than you'd expect — model code is often boilerplate that a one-time setup commit handles.

Three end-to-end workflows

Skills compose. Here are three loops I run repeatedly, calling out which skill bucket fires at each step. Read these as recipes, not prescriptions — your team's flavour will differ.

Workflow A: ad-hoc question → SQL → interpretation → write-up

A stakeholder asks "are paying customers churning faster this quarter than last?" You hand it to Claude Code.

Frame. The ad-hoc question framer skill turns the vague question into a precise one: decision (whether to investigate retention features in this quarter's planning), metric (90-day net-revenue retention for the paying segment), population (paying accounts as of the cohort entry date), time window (Q1 vs Q4 prior year), tie-breakers (what counts as "paying").
SQL. The dialect-aware writer produces the query against your warehouse. The reviewer skill catches an issue where the cohort definition is sensitive to a join order.
Interpret. Results come back. You ask Claude to interpret. Without a skill, this is where models tend to overclaim. With an interpretation skill that forces hedging on small effect sizes and small sample cohorts, the readout is calibrated.
Write-up. The notebook-to-readout pattern produces a stakeholder-ready summary with the framing question at the top, the methodology in the middle, and the call-to-action at the bottom.

Workflow B: schema migration → backfill → rollout doc

Marketing wants to track a new event property. The events table needs a column.

Plan. Migration planner outputs forward and rollback statements, identifies the operation as a metadata-only add in your warehouse, and notes that no rewrite is needed.
Backfill design. For historical events lacking the property, you need a default or a recomputation. A backfill skill produces the strategy: chunk size, throttle, idempotency check, dry-run query, and a stop condition.
Rollout doc. A doc-writer skill produces the rollout plan: what changes, when, who's on-call, rollback trigger, validation queries. The data contract authoring skill updates the source-of-truth contract for the events table.

Workflow C: experiment design → analysis plan → readout

The growth team wants to test a checkout-page change.

Design. Experiment design skill computes power, recommends randomisation unit, and forces you to write the decision rule and primary metric before launch.
Analysis plan. A pre-registration skill produces the analysis plan as a document, including primary and secondary metrics, segments to inspect, and what would constitute a stop condition.
Readout. Once results land, the A/B analysis skill produces the readout matching the pre-registered plan. Any analyses not in the original plan are clearly flagged as exploratory.

The discipline of pre-registering the analysis is the biggest single quality lever in experiment work. A skill that enforces it pays for itself the first time someone tries to add a segment cut after seeing the results.

Privacy and governance considerations

If you work with personal data, regulated data, or anything subject to internal data-handling policy, a few of the picks above matter disproportionately and a few new ones are worth adding.

Skills as policy enforcement

The data contract authoring and model card picks are the two highest-leverage governance skills. Contracts make breaking schema changes visible at PR time rather than at incident time. Model cards force documentation of intended use and limitations before a model ships, which is the artefact regulators and auditors ask for first.

PII detection skill

Add a skill that scans SQL queries, dbt models, and notebooks for references to PII columns (your team's email, phone_number, address, etc.) and flags them when they appear in destinations that shouldn't see them — public-facing dashboards, third-party tool destinations, exports. The good versions read a configurable list of sensitive column patterns from a file in your repo, so the rule set lives in version control alongside the code.

Synthetic data generator for development

A skill that takes a production table schema and produces a synthetic dataset with the same statistical properties but no real values, for use in development and testing. The mediocre ones produce uniform random data that won't catch realistic distribution-sensitive bugs. The good ones preserve marginals and at least some correlations.

Access-request response helper

For GDPR / CCPA / similar regimes: a skill that takes a user identifier and produces the queries to extract the user's data across your team's tables, formatted for delivery. The skill itself should be careful about not executing the queries automatically and should remind the operator to log the request.

Watch the body, not just the description

For any governance-adjacent skill, read the SKILL.md body before installing. Skills that mention specific compliance frameworks (HIPAA, GDPR, SOC 2) should treat them with appropriate hedging — no skill can make your code "HIPAA compliant", only your processes and controls can. Skills that promise compliance are usually overselling. Skills that help you produce the artefacts compliance reviewers ask for are doing the right thing.

One specific anti-pattern to avoid: skills that automatically redact or transform what they consider sensitive data without telling you. The right default for any privacy-adjacent skill is to flag and ask, not to silently modify.

When to stop installing and write your own

Every team I've watched go through skills adoption hits the same moment: after installing twenty or thirty, they realise the marginal skill they're considering is shaped slightly wrong for their context. The frontmatter convention doesn't match their dbt project. The naming doesn't match their warehouse. The opinionated examples assume a stack they don't run. At that point, the right move is to write your own.

The signal that it's time

You know it's time when you find yourself editing the SKILL.md body after installing it, three times in a row, to make it match how your team actually works. At that point you're better off writing a skill from scratch that encodes your team's specific conventions — your table prefixes, your dialect, your review checklist, your readout template, your incident-response playbook.

What to encode first

The highest-leverage in-house skills are usually the ones that capture decisions your team has already made and is enforcing in code review. "Always include a backfill plan for new dbt models that depend on event data." "All A/B test readouts use the team's standard template." "PRs touching production tables require a rollback statement in the description." These are rules your senior team already enforces; a skill makes the enforcement consistent and removes the bottleneck of needing the senior team to review every PR.

Don't over-engineer the skill itself

A skill that does one thing well, with a tight description field and 200-400 lines of body, is usually better than an ambitious skill that tries to be a framework. The body should read like a senior engineer's handover note, not like a reference manual. Concrete examples beat abstract rules. If you can't think of a real query or migration to put in the example, the skill probably shouldn't exist yet.

Sharing back

If your team's in-house skill captures a pattern other data teams would benefit from — and most do — consider publishing it. The catalog grows by community contribution, and the skills that get the most use are usually the ones written by practitioners solving real problems on real systems. The format for publishing is straightforward and documented at /install/.

If you want a primer on the file format itself before authoring, the writing a SKILL.md file guide covers the frontmatter fields, the body structure, and the conventions that make skills trigger reliably. For a sense of what "good" looks like, the curated best-of selection is the cleanest reference set.

Frequently asked questions

How many skills should a data engineer install?: Most teams I see plateau between 15 and 30 active skills. Beyond that, you hit diminishing returns and trigger collisions where multiple skills compete for the same request. Start with the picks in this guide that match your actual workflow and add others only when you notice a repeating manual task that none of your installed skills cover.
Will Claude Code skills work with my warehouse — Snowflake, BigQuery, Redshift, ClickHouse?: Yes, but warehouse-specific skills are far more useful than generic SQL ones. Look for skills whose description names your warehouse explicitly, or that take a warehouse target as an argument. Generic ANSI-SQL skills produce queries that run but don't use the idioms your warehouse rewards.
Can skills run queries against my data, or do they only generate code?: Skills are instructions for Claude — they don't execute anything by themselves. Whether a query runs depends on which tools Claude Code has access to in your environment, which you control via the allowed-tools mechanism. A skill can suggest running a query; you decide whether the agent actually has shell or database access to do it.
Do data quality skills replace tools like Great Expectations or dbt tests?: No, they complement them. A skill helps you author the expectations or tests faster and more consistently, but the underlying framework still runs the checks. Think of skills as authoring assistance, not as a replacement for the data-quality runtime.
What about Jupyter notebooks — can skills help inside a notebook?: Skills fire inside Claude Code sessions, which most analysts run alongside their notebook rather than inside it. The pattern I see work best is using Claude Code in a terminal pane next to the notebook, asking it for SQL or analysis logic, and pasting the result. Notebook-to-readout skills are particularly good at this loop.
How do I keep team-wide skills in sync across analyst machines?: Check your team's skills into a Git repository and have each analyst symlink or rsync the repo into their ~/.claude/skills/ directory. Some teams script this in their onboarding setup. The skills themselves are plain Markdown, so version control handles them cleanly.
Are there skills specifically for ML experiment tracking with MLflow or Weights & Biases?: Yes — look for training-pipeline scaffolder skills that explicitly mention your tracking tool in the description. Generic training skills will often add logging as an afterthought, while ones that target a specific tracker bake it into the structure from the start, which is materially better.

Found a bug or want a topic covered? Email [email protected] or open an issue via GitHub.