Claude Code Skills·Claude Skills·The open SKILL.md registry for Claude
ClaudSkillsEngineering › ML AI Eng › Page 2

ML AI Eng (Page 2 of 3)

144 Claude Code skills in the ML AI Eng sub-category of Engineering.

144 skills · updated 2026-06-12 · showing 61–120 of 144 by quality score

For the full experience including quality scoring and one-click install features for each skill — upgrade to Pro.

Evaluates code generation models across HumanEval, MBPP, MultiPL-E, and 15+ benchmarks with pass@k metrics.
Evaluates code generation models across HumanEval, MBPP, MultiPL-E, and 15+ benchmarks with pass@k metrics.
Build face recognition systems with InsightFace, ArcFace, enrollment pipelines, HDBSCAN clustering, and privacy-compliant architecture.
Prepare high-quality datasets for LLM fine-tuning with filtering, deduplication, augmentation, and RLHF data formatting.
Manage Vertex AI Training jobs (GPU/TPU cost governance), Vertex AI Pipelines, Model Registry, Feature Store, Endpoints, and Gemini API integration for production MLOps.
Build RAG systems and semantic search with Gemini embeddings (gemini-embedding-001). 768-3072 dimension vectors, 8 task types, Cloudflare Vectorize integration.
Use when evaluating GraphRAG system quality across knowledge graph completeness, retrieval relevance, answer correctness, and reasoning verification.
Use when designing complete GraphRAG systems that combine graph retrieval with LLM reasoning. Invoke when user mentions GraphRAG system, technology stack, Neo4j with LLM,…
Hybrid interview that probes AI-engineering mastery by tip-vocabulary depth — entity referencing, loop closure, observability, harness improvement — not by token usage or LOC.
Hierarchical Task Network planning (provably correct plans via symbolic decomposition + LLM fallback) and AlphaEvolve evolutionary code search (fitness-gated genetic algorithm).
Manage Huawei ModelArts training jobs (GPU and Ascend NPU cost governance), Pangu foundation model deployment, AI Gallery model management, and MLOps pipeline automation for AI/ML…
Build Gradio web UIs and demos in Python. Use when creating or editing Gradio apps, components, event listeners, layouts, or chatbots. — from yanochka11/harness_bro
Deploys models from Hugging Face Hub to Inference Endpoints using the huggingface_hub client and REST API.
Receive and verify Hugging Face webhooks. Use when setting up Hugging Face webhook handlers, debugging X-Webhook-Secret verification, or handling events on models, datasets, and…
Fast structured generation and serving for LLMs with RadixAttention prefix caching. Use for JSON/regex outputs, constrained decoding, agentic workflows with tool calls, o — from…
Design, generate, and fully deploy a stylish, production-ready Retrieval-Augmented Generation (RAG) chatbot embedded directly into any website/project.
Build the internal link graph for a site, run PageRank-style authority distribution, detect orphan pages, and recommend new internal links via embedding-based semantic similarity…
KV-cache optimization patterns for LLM inference. Prefix caching, sliding window attention, cache reuse across turns, static cache for fixed prompts, and TTFT reduction…
Use when implementing a LangChain-based agent runtime from an approved ai-architecture.md agent control-flow design.
LightRAG is a Python-based retrieval-augmented generation framework that builds knowledge graphs from documents for more connected, contextual retrieval.
Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral).
Automatically applies when building LLM applications. Ensures proper async patterns for LLM calls, streaming responses, token management, retry logic, and error handling.
Production-ready patterns for building LLM applications. Covers RAG pipelines, agent architectures, prompt IDEs, and LLMOps monitoring.
Build automated LLM evaluation pipelines with benchmarks, regression tests, RAGAS, and human eval workflows.
Use when user needs ML model deployment, production serving infrastructure, optimization strategies, and real-time inference systems.
Supervised and unsupervised learning, bias-variance tradeoff, cross-validation, decision trees, ensemble methods, neural network fundamentals, and the practitioner's workflow from…
MemGPT virtual context — OS virtual-memory analogy for LLM context management. Two-tier (main context = RAM, external store = disk), page-in/page-out tools, archival/core memory…
Build production ML systems with PyTorch 2.x, TensorFlow, and modern ML frameworks. Implements model serving, feature engineering, A/B testing, and monitoring.
Expert in building scalable ML systems, from data pipelines and model training to production deployment and monitoring.
Déploiement de modèles ML en production (MLOps). Se déclenche avec "déployer un modèle", "ML deployment", "MLOps", "model serving", "inference", "model registry", "ML pip — from…
Complete end-to-end MLOps pipeline orchestration from data preparation through model deployment.
Build end-to-end MLOps pipelines from data preparation through model training, validation, and production deployment.
Design and implement ML operations — model registry, serving patterns, deployment strategies (shadow/canary/blue-green), drift detection, feature stores, retraining triggers, and…
Expert in Machine Learning Operations bridging data science and DevOps. Use when building ML pipelines, model versioning, feature stores, or production ML serving.
Model Registry Manager - Auto-activating skill for ML Deployment. Triggers on: model registry manager, model registry manager Part of the ML Deployment skill category.
Deploy ML models as production APIs with vLLM, TGI, ONNX Runtime, batching, autoscaling, and GPU optimization.
Train Mixture of Experts (MoE) models using DeepSpeed or HuggingFace. Use when training large-scale models with limited compute (5× cost reduction vs dense models), imple — from…
Use when debugging a Nemo Gym run or reward profiling job. Covers rollout collection failures, empty or partial JSONL outputs, stale materialized inputs, verifier/schema errors,…
Use when building GraphRAG pipelines on Neo4j with the neo4j-graphrag Python package: VectorRetriever, VectorCypherRetriever, HybridRetriever, HybridCypherRetriever,…
Use when implementing retrieval-augmented generation with OpenAI from an approved ai-architecture.md and retrieval design.
Use when implementing an OpenAI-backed AI capability that must return schema-bound JSON, typed objects, classifications, extraction results, or other machine-consumable responses…
Use when implementing OpenAI tool or function calling from an approved ai-architecture.md tool surface.
Fine-tune and serve Physical Intelligence OpenPI models (pi0, pi0-fast, pi0.5) using JAX or PyTorch backends for robot policy inference across ALOHA, DROID, and LIBERO…
High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3.
Fine-tunes and evaluates OpenVLA-OFT and OpenVLA-OFT+ policies for robot action generation with continuous action heads, LoRA adaptation, and FiLM conditioning on LIBERO…
Orchestrate end-to-end machine learning pipelines using Prefect or Airflow with DAG construction, task dependencies, retry logic, scheduling, monitoring, and integration with…
Integrates with the Plaid Transactions API using the plaid Python SDK to pull 90 days of transaction history across linked bank accounts.
High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3.
Guarantee valid JSON/XML/code structure during generation, use Pydantic models for type-safe outputs, support local models (Transformers, vLLM), and maximize inference sp — from…
中文优先:用于PyTorch模式相关任务,帮助识别、设计、实现或验证对应工作流。English keywords: PyTorch deep learning patterns and best practices for building robust, efficient, and reproducible training pipelines,…
Optimize accuracy for RAG (Retrieval-Augmented Generation) systems. Covers: DB schema design, chunking strategies, retrieval optimization, accuracy testing, and anti-hallucination…
Design LLM applications using the LangChain framework with agents, memory, and tool integration patterns.
Build production document ingestion pipelines with chunking, embedding, and vector DB storage. Activate on: document ingestion, chunking strategy, embedding pipeline, vector DB…
RAG architecture: embeddings, chunking strategies, hybrid search (BM25 + vector), reranking, CRAG/self-correcting, multi-hop reasoning, evaluation metrics.
Design and architect RAG (Retrieval-Augmented Generation) pipelines. Covers vector DB selection, chunking strategies, hybrid retrieval (vector + knowledge graph), semantic…
High-performance vector similarity search engine for RAG and semantic search. Use when building production RAG systems requiring fast nearest neighbor search, hybrid sear — from…
Run repeatable eval suites against prompts, RAG pipelines, and agents so regressions surface before release.
Use promptfoo when an agent needs to evaluate prompt, agent, or RAG behavior against saved assertions before a change goes live.
ReWOO decoupled planning — Planner/Worker/Solver split. 5x fewer tokens than ReAct on HotpotQA, +4% accuracy. Plan-and-Execute generalization, planner distillation to 7B.
Self-Refine iterative improvement (generate/feedback/refine loop, +20 avg across 7 tasks) and CRITIC external verification (tool-grounded critique for factual tasks).
All Engineering skills →
More in EngineeringTesting (2,448) · Devops (2,410) · Architecture (1,778) · Backend (1,375) · Frontend (1,035) · Languages (880) · Cloud Platforms (802) · Code Quality (774) · Databases (568) · Performance (517) · Mobile (379) · Observability (272) · Data Engineering (230) · Docs Engineering (197) · Workflow Orchestration (170) · API Tooling (15)