Data Engineering

For the full experience including quality scoring and one-click install features for each skill — upgrade to Pro.

ibm-data

Db2 on Cloud, IBM Cloud Databases (PostgreSQL/MongoDB/Redis/etc via ibmcloud cdb), Cloud Object Storage (S3-compatible, HMAC, Aspera), Cloudant (CouchDB), Event Streams (Kafka)

kafka-expert

Expert-level Apache Kafka, event streaming, Kafka Streams, and distributed messaging

async-processing-patterns

Queues, workers, retries, and safe asynchronous execution models. Use when: async processing, message queue design, worker design, retry strategy, job queue, background…

backpressure-patterns

Flow control and system stability under load through backpressure mechanisms. Use when: backpressure, flow control, system stability under load, producer consumer, channel buffer,…

canon-design-review

Architecture-level design review evaluating 15 Tier 2 best practices drawn from DDD (Evans), POEAA (Fowler), Release It! (Nygard), DDIA (Kleppmann), Team Topologies…

data-pipeline-architecture

Design ETL/ELT pipeline architectures with data flow diagrams and transformation specs for Supabase and BigQuery

ingesting-into-data-lake

Import data into the AWS data lake from S3 files, local uploads, JDBC databases (Oracle, SQL Server, PostgreSQL, MySQL, RDS, Aurora), Amazon Redshift, Snowflake, BigQuery,…

kafka-connector-review

Review Kafka Connect connector configurations for common misconfigurations using the Lenses MCP server.

kafka-consumer-lag

Analyse Kafka consumer group lag using the Lenses MCP server. Diagnoses lag causes (throughput bottlenecks, rebalancing, partition skew, stalled consumers) and suggests…

kafka-dlq-review

Review dead letter queue implementations for completeness using the Lenses MCP server. Checks DLQ topic existence, configuration, monitoring, metadata preservation, retry logic,…

kafka-perf-review

Review Kafka producer and consumer performance configurations in both the live cluster (via Lenses MCP) and the codebase.

kafka-python-client

Scaffold a production-ready Python Kafka producer and consumer using `confluent-kafka-python`, with Schema Registry, graceful shutdown, idempotent producer, tests and a complete…

kafka-schema-review

Review Kafka schema changes (Avro, Protobuf, JSON Schema) for compatibility and evolution best practices using the Lenses MCP server.

kafka-shadowtraffic

Generate a ShadowTraffic configuration to populate a Kafka topic with realistic synthetic data. Discovers the target topic, its key and value schemas, and the correct serializers…

kafka-shadowtraffic-java

Generate a TestContainers Java test class that spins up ShadowTraffic in-process to populate a Kafka topic with synthetic data during tests.

kafka-topic-audit

Audit all Kafka topic configurations against production best practices using the Lenses MCP server. Checks replication factor, retention, partitions, compaction, naming…

social-pulse-monitor

Use when the user asks to "monitor brand mentions", "set up social listening", "did anything spike about us this week", or "watch these accounts for buying triggers"; runs…

write-query

Write optimized SQL for your dialect with best practices. Use when translating a natural-language data need into SQL, building a multi-CTE query with joins and aggregations,…

Kafka Topic Designer

Designs and optimizes Apache Kafka topics and configurations

accessing-osprey

Understanding the Osprey moderation infrastructure — system architecture, ClickHouse data access, schema reference, and relationship to Ozone labelling.

acuantia-dataform

Use when working on Acuantia's BigQuery Dataform pipeline (acuantia-gcp-dataform project) - adds Acuantia-specific patterns on top of dataform-engineering-fundamentals: ODS…

add-kafka-consumer

Add KafkaFlow consumer handlers for processing Kafka/Redpanda messages (project)

airflow-dag-patterns

Build production Apache Airflow DAGs with best practices for operators, sensors, testing, and deployment.

agency-data-engineer

Expert data engineer specializing in building reliable data pipelines, lakehouse architectures, and scalable data infrastructure.

agent-data-engineer

Expert data engineer specializing in building scalable data pipelines, ETL/ELT processes, and data infrastructure.

agent-trajectory-evaluator

Evaluate a multi-step AI agent's whole run — tool calls, intermediate steps, and final result — not just final-answer correctness, so you can pinpoint WHERE it went wrong.

ai-data-engineering

Data pipelines, feature stores, and embedding generation for AI/ML systems. Use when building RAG pipelines, ML feature serving, or data transformations.

airflow

Python DAG workflow orchestration using Apache Airflow for data pipelines, ETL processes, and scheduled task automation

airflow-dag-patterns

Build production Apache Airflow DAGs with best practices for operators, sensors, testing, and deployment.

airflow-dag-patterns

Build production Apache Airflow DAGs with best practices for operators, sensors, testing, and deployment.

airflow-dag-patterns

Build production Apache Airflow DAGs with best practices for operators, sensors, testing, and deployment.

airflow-dag-patterns

Build production Apache Airflow DAGs with best practices for operators, sensors, testing, and deployment.

apache-airflow-orchestration

Complete guide for Apache Airflow orchestration including DAGs, operators, sensors, XComs, task dependencies, dynamic workflows, and production deployment

apache-kafka-consumer-lag-runbook

Diagnoses Kafka consumer group lag using the Kafka AdminClient API and JMX metrics exposed via the Confluent Metrics API.

apache-kafka-event-streaming-and-schema-registry

Production-grade guidance for Apache Kafka event streaming, partition strategies, consumer group management, transactional messaging, and Confluent Schema Registry integration…

apache-kafka-schema-extractor

Extracts and transforms Avro, Protobuf, and JSON Schema definitions from Confluent Schema Registry. Generates typed data models and validates schema compatibility using the Schema…

apache-kafka-stream-processor

Apache Kafka Stream Processor is built around Apache Kafka event streaming platform. The underlying ecosystem is represented by tulios/kafkajs (3,987+ GitHub stars).

apache-kafka-stream-transformer-2

Processes real-time event streams using KafkaJS consumer groups and transforms messages with configurable schemas.

apache-spark-data-processing

Complete guide for Apache Spark data processing including RDDs, DataFrames, Spark SQL, streaming, MLlib, and production deployment

architecting-data

Strategic guidance for designing modern data platforms, covering storage paradigms (data lake, warehouse, lakehouse), modeling approaches (dimensional, normalized, data vault,…

argo-workflows-dag-pipeline-builder

Constructs Kubernetes-native workflow DAGs using Argo Workflows CRDs with configurable retry strategies, artifact passing via S3/MinIO, and template composition through…

argumentation

Konstruiert gut strukturierte Argumente mithilfe der Hypothese-Argument-Beispiel- Triade. Behandelt die Formulierung falsifizierbarer Hypothesen, den Aufbau logischer Argumente…

astro-airflow

Inspect and debug Airflow on Astronomer (Astro) deployments — fetch DAG runs, task instance logs, container logs, env vars, and deployment state without installing an MCP plugin.

authoring-dags

Workflow and best practices for writing Apache Airflow DAGs. Use when the user wants to create a new DAG, write pipeline code, or asks about DAG patterns and conventions.

autocoverage

Use when a user asks to generate or improve unit tests with measurable coverage gains using a strict staged workflow (build -> baseline -> plan -> generate -> verify -> coverage…

bfc-run-prod-user-locally

Run BFC (cost basis computation) locally using production user data dumped from BigQuery. Use when: (1) profiling BFC memory/performance for a specific user, (2) debugging a…

bigquery-bigframes

Generates Python code using BigQuery DataFrames (BigFrames), the pandas/scikit-learn-style API over BigQuery.

bigquery-cost-optimization

Use when asking about BigQuery costs, pricing, bytes billed, slot usage, reducing query costs, choosing between on-demand and editions pricing, managing reservations, optimizing…

bigquery-pipeline-audit

Audits Python + BigQuery pipelines for cost safety, idempotency, and production readiness. Returns a structured report with exact patch locations.

bio-ecological-genomics-community-ecology

Analyzes species-environment relationships with constrained ordination (CCA, RDA, db-RDA), variance partitioning, indicator species (indicspecies IndVal.g group-equalized),…

building-edgespark-apps

Build and modify EdgeSpark apps. Use when a project has edgespark.toml, the user mentions EdgeSpark, or work involves the edgespark CLI, server SDK types, storage/auth/database…

building-with-dapr

Use when building distributed microservices with Dapr sidecar architecture. Triggers include Dapr components, service invocation, state management, pub/sub, secrets, bindings,…

building-with-kafka-strimzi

Use when building event-driven systems with Apache Kafka on Kubernetes. Triggers include EDA patterns, Kafka producers/consumers, Strimzi operator deployment, Schema Registry,…

cdc-debezium-rails

Change Data Capture (CDC) from a Rails Postgres database via Debezium into Kafka topics — logical replication setup, wal_level=logical, REPLICA IDENTITY FULL, publication /…

change-maintenance-specification

Specifies bugs, upgrades, refactors, and behavioral changes as explicit deltas against existing feature specifications, preserving original intent while defining what changes,…

ci-topic-selection

Use when deciding whether a project fits Critical Inquiry (CI) and which format to target. CI is the leading interdisciplinary journal of criticism and theory in the arts and…

cicd-pipeline-builder

Conception de pipelines CI/CD pour tout type de plateforme. Se déclenche avec "CI/CD", "pipeline", "GitHub Actions", "Azure DevOps", "GitLab CI", "déploiement automatique — from…

clickhouse-architecture-advisor

MUST USE when designing ClickHouse architectures, selecting between ingestion or modeling patterns, or translating best practices into workload-specific system designs.

clickhouse-cdc

Use when syncing data FROM relational databases (PostgreSQL, MySQL, MongoDB) TO ClickHouse. Covers change data capture using Debezium, Airbyte, or custom triggers.

clickhouse-ci-integration

Run ClickHouse integration tests in CI with GitHub Actions and Docker containers. Use when setting up automated testing against a real ClickHouse instance, configuring CI…

Categories

Use cases

Popular tags

Learn

Site