Claude Code Skills·Claude Skills·The open SKILL.md registry for Claude
ClaudSkillsEngineering › Data Engineering

Data Engineering

230 Claude Code skills in the Data Engineering sub-category of Engineering.

230 skills · updated 2026-06-12 · showing 1–60 of 230 by quality score

For the full experience including quality scoring and one-click install features for each skill — upgrade to Pro.

Db2 on Cloud, IBM Cloud Databases (PostgreSQL/MongoDB/Redis/etc via ibmcloud cdb), Cloud Object Storage (S3-compatible, HMAC, Aspera), Cloudant (CouchDB), Event Streams (Kafka)
Expert-level Apache Kafka, event streaming, Kafka Streams, and distributed messaging
Architecture-level design review evaluating 15 Tier 2 best practices drawn from DDD (Evans), POEAA (Fowler), Release It! (Nygard), DDIA (Kleppmann), Team Topologies…
Design ETL/ELT pipeline architectures with data flow diagrams and transformation specs for Supabase and BigQuery
Import data into the AWS data lake from S3 files, local uploads, JDBC databases (Oracle, SQL Server, PostgreSQL, MySQL, RDS, Aurora), Amazon Redshift, Snowflake, BigQuery,…
Review Kafka Connect connector configurations for common misconfigurations using the Lenses MCP server.
Analyse Kafka consumer group lag using the Lenses MCP server. Diagnoses lag causes (throughput bottlenecks, rebalancing, partition skew, stalled consumers) and suggests…
Review dead letter queue implementations for completeness using the Lenses MCP server. Checks DLQ topic existence, configuration, monitoring, metadata preservation, retry logic,…
Review Kafka producer and consumer performance configurations in both the live cluster (via Lenses MCP) and the codebase.
Scaffold a production-ready Python Kafka producer and consumer using `confluent-kafka-python`, with Schema Registry, graceful shutdown, idempotent producer, tests and a complete…
Review Kafka schema changes (Avro, Protobuf, JSON Schema) for compatibility and evolution best practices using the Lenses MCP server.
Audit all Kafka topic configurations against production best practices using the Lenses MCP server. Checks replication factor, retention, partitions, compaction, naming…
Write optimized SQL for your dialect with best practices. Use when translating a natural-language data need into SQL, building a multi-CTE query with joins and aggregations,…
Designs and optimizes Apache Kafka topics and configurations
Use when working on Acuantia's BigQuery Dataform pipeline (acuantia-gcp-dataform project) - adds Acuantia-specific patterns on top of dataform-engineering-fundamentals: ODS…
Add KafkaFlow consumer handlers for processing Kafka/Redpanda messages (project)
Expert data engineer specializing in building scalable data pipelines, ETL/ELT processes, and data infrastructure.
Data pipelines, feature stores, and embedding generation for AI/ML systems. Use when building RAG pipelines, ML feature serving, or data transformations.
Python DAG workflow orchestration using Apache Airflow for data pipelines, ETL processes, and scheduled task automation
Build production Apache Airflow DAGs with best practices for operators, sensors, testing, and deployment.
Build production Apache Airflow DAGs with best practices for operators, sensors, testing, and deployment.
Build production Apache Airflow DAGs with best practices for operators, sensors, testing, and deployment.
Complete guide for Apache Airflow orchestration including DAGs, operators, sensors, XComs, task dependencies, dynamic workflows, and production deployment
Diagnoses Kafka consumer group lag using the Kafka AdminClient API and JMX metrics exposed via the Confluent Metrics API.
Extracts and transforms Avro, Protobuf, and JSON Schema definitions from Confluent Schema Registry. Generates typed data models and validates schema compatibility using the Schema…
Apache Kafka Stream Processor is built around Apache Kafka event streaming platform. The underlying ecosystem is represented by tulios/kafkajs (3,987+ GitHub stars).
Processes real-time event streams using KafkaJS consumer groups and transforms messages with configurable schemas.
Complete guide for Apache Spark data processing including RDDs, DataFrames, Spark SQL, streaming, MLlib, and production deployment
Strategic guidance for designing modern data platforms, covering storage paradigms (data lake, warehouse, lakehouse), modeling approaches (dimensional, normalized, data vault,…
Constructs Kubernetes-native workflow DAGs using Argo Workflows CRDs with configurable retry strategies, artifact passing via S3/MinIO, and template composition through…
Konstruiert gut strukturierte Argumente mithilfe der Hypothese-Argument-Beispiel- Triade. Behandelt die Formulierung falsifizierbarer Hypothesen, den Aufbau logischer Argumente…
Workflow and best practices for writing Apache Airflow DAGs. Use when the user wants to create a new DAG, write pipeline code, or asks about DAG patterns and conventions.
Use when a user asks to generate or improve unit tests with measurable coverage gains using a strict staged workflow (build -> baseline -> plan -> generate -> verify -> coverage…
Use when asking about BigQuery costs, pricing, bytes billed, slot usage, reducing query costs, choosing between on-demand and editions pricing, managing reservations, optimizing…
Audits Python + BigQuery pipelines for cost safety, idempotency, and production readiness. Returns a structured report with exact patch locations.
Analyzes species-environment relationships with constrained ordination (CCA, RDA, db-RDA), variance partitioning, indicator species (indicspecies IndVal.g group-equalized),…
Build and modify EdgeSpark apps. Use when a project has edgespark.toml, the user mentions EdgeSpark, or work involves the edgespark CLI, server SDK types, storage/auth/database…
Use when building distributed microservices with Dapr sidecar architecture. Triggers include Dapr components, service invocation, state management, pub/sub, secrets, bindings,…
Use when building event-driven systems with Apache Kafka on Kubernetes. Triggers include EDA patterns, Kafka producers/consumers, Strimzi operator deployment, Schema Registry,…
Specifies bugs, upgrades, refactors, and behavioral changes as explicit deltas against existing feature specifications, preserving original intent while defining what changes,…
Conception de pipelines CI/CD pour tout type de plateforme. Se déclenche avec "CI/CD", "pipeline", "GitHub Actions", "Azure DevOps", "GitLab CI", "déploiement automatique — from…
Use when syncing data FROM relational databases (PostgreSQL, MySQL, MongoDB) TO ClickHouse. Covers change data capture using Debezium, Airbyte, or custom triggers.
Run ClickHouse integration tests in CI with GitHub Actions and Docker containers. Use when setting up automated testing against a real ClickHouse instance, configuring CI…
Diagnose and fix the top 15 ClickHouse errors — query failures, insert problems, memory limits, and merge issues.
Collect ClickHouse diagnostic data — system tables, query logs, merge status, and server metrics for support tickets and troubleshooting.
Create your first ClickHouse table, insert data, and run analytical queries. Use when starting a new ClickHouse project, learning MergeTree basics, or testing your ClickHouse…
Run ClickHouse locally with Docker, configure test fixtures, and iterate fast. Use when setting up a local ClickHouse dev environment, writing integration tests, or running…
Configure ClickHouse across dev, staging, and production with environment-specific settings, secrets management, and infrastructure-as-code patterns.
Monitor ClickHouse with Prometheus metrics, Grafana dashboards, system table queries, and alerting for query performance, merge health, and resource usage.
Production readiness checklist for ClickHouse — server tuning, backup, monitoring, and deployment verification.
Configure ClickHouse query concurrency, memory quotas, and connection limits. Use when hitting "too many simultaneous queries", managing concurrent users, or tuning server-side…
Production reference architecture for ClickHouse-backed applications — project layout, data flow, multi-tenant patterns, and operational topology.
Use when ingesting continuous data streams from Kafka, RabbitMQ, or Kinesis into ClickHouse. Covers backpressure handling, exactly-once semantics, stream processing patterns, and…
Real-time structural Code Health via CodeScene MCP — review before edits, verify score deltas after changes, gate commits and PRs.
Lets an agent run dbt parity checks, relation diffs, and row or value comparisons so refactors and source swaps can be verified before rollout.
Kafka Connect integration expert. Covers source and sink connectors, JDBC, Elasticsearch, S3, Debezium CDC, SMT (Single Message Transforms), connector configuration, and data…
Use when writing Spark jobs, debugging performance issues, or configuring cluster settings for Apache Spark applications, distributed data processing pipelines, or big da — from…
Synthesize Reflector insights into structured delta proposals for playbook updates, following ACE paper's Curator architecture
Deploys the current project to a live HTTPS URL via Cybrix. Activates on any request to make the current project public, get a URL for it, deploy it, ship it, host it, publish it,…
Orchestrate data pipelines using Dagster, the cloud-native data orchestration platform. Define data assets as Python functions with automatic lineage tracking, scheduling, and…
All Engineering skills →
More in EngineeringTesting (2,448) · Devops (2,410) · Architecture (1,778) · Backend (1,375) · Frontend (1,035) · Languages (880) · Cloud Platforms (802) · Code Quality (774) · Databases (568) · Performance (517) · Mobile (379) · Observability (272) · Docs Engineering (197) · Workflow Orchestration (170) · ML AI Eng (144) · API Tooling (15)