Claude Code Skills·Claude Skills·The open SKILL.md registry for Claude
Home › Tag › Llm

Llm — Claude Code Skills

821 Claude Code skills tagged Llm. Browse all AI provider, model, or runtime-related skills in the open ClaudSkills registry — free to install, one-click via the desktop app.

Showing top 200 of 821 skills, ranked by quality score.

dehumanize-ai-text

Применять при переписывании текстов, сгенерированных LLM-агентами (отчеты, README, доки, письма, посты), в живой человеческий стиль. Триггеры - пользователь пишет «убери AI-стиль»,

general

scan-setup

Use when wiring a repo to maintained DETERMINISTIC scanner gates (SAST, dependency-CVE/SBOM, secret-history, IaC/container, mutation, fuzz) that produce ground-truth observables —

security

42-keyword-discovery

Keyword discovery en ideation vanuit seed keywords. Haalt suggestions, related keywords, zoekvolume en difficulty op via DataForSEO. Classificeert intent, berekent opportunity scor

growth

dev-prompt-engineering

Prompt optimization for LLMs. Trigger when the user wants to improve a prompt, add examples, or structure instructions.

general

huggingface-classifier

Hugging Face transformer model fine-tuning and inference for intent classification

general

inference-aa-workload

Reproduce the Artificial Analysis (AA) language-model performance workload shapes against an OpenAI-compatible chat endpoint using NVIDIA AIPerf. Drives the three AA text shapes (1

general

inference-quantize-calibrate

Produce quantized inference weights from a BF16/FP8 base checkpoint via a post-training-quantization (PTQ) pipeline -- instead of only ever pulling NVFP4 weights pre-quantized. A c

science

lint-markdown

Execute markdown validation with taxonomy-based classification and custom rules. Use when validating markdown compliance with LLM-facing writing standards or when generating struct

general

llm-governance

LLM content governance and compliance standards. Use when llm governance guidance is required.

general

llm-integration

LLM integration patterns for function calling, streaming responses, local inference with Ollama, and fine-tuning customization. Use when implementing tool use, SSE streaming, local

engineering

llm-security

Helpt bij het implementeren van LLM-specifieke beveiligingscontrols voor overheidstoepassingen, gebaseerd op de OWASP LLM Top 10, BIO2, NIS2 en AVG. Biedt prompt injection detectie

security

loom-prompt-engineering

Designs and optimizes prompts for large language models including system prompts, agent signals, and few-shot examples. Use for instruction design, prompt security, chain-of-though

security

netllm-swarm

Configure multi-machine LAN mesh for swarm-llm (netllm). Use when the user asks to set up a swarm, connect multiple machines (macOS, Linux, Windows), enable LAN routing, find peers

general

rag-quality

RAG 시스템 품질 평가 및 개선을 위한 스킬입니다. RAGAS 기반 LLM-as-Judge 평가, 사용자 페르소나 시뮬레이션, 합성 데이터 생성, 평가 결과 저장 및 분석 기능을 제공합니다.

general

st-agent

A multimodal LLM-based AI agent for deep spatial transcriptomics research, capable of dynamic code generation, visual reasoning, and literature retrieval.

content

add-llm-provider

Adds a new LLM provider to the multi-provider rotation system. Use when the user wants to add a new AI provider like OpenAI, Together, Fireworks, etc. Don't use for Groq — Groq is

general

ag-destilar

Comprime documentos grandes para formato LLM-optimal. Mantem toda informacao em menos tokens. Para TOTVS KB, Design Library, SPECs grandes, docs de referencia. Inspirado no BMAD di

general

alphaxiv

Quick single-paper lookup via AlphaXiv LLM-optimized summaries with tiered source fallback. Use when user says "explain this paper", "summarize paper", pastes an arXiv/AlphaXiv URL

general

bmad-distillator

Lossless LLM-optimized compression of source documents. Use when the user requests to 'distill documents' or 'create a distillate'.

general

canon-pr-review

Review a PR against the top 20 Tier 2 LLM-enforceable best practices from 35 seminal software engineering books (Code Complete, Clean Code, A Philosophy of Software Design, Refacto

engineering

ce-optimize

Run metric-driven iterative optimization loops. Define a measurable goal, build measurement scaffolding, then run parallel experiments that try many approaches, measure e — from Ev

science

choose

**DEFAULT for ROUTING AMBIGUITY — interactive picker that surfaces top skill candidates + 2 LLM-rewritten prompt variants via AskUserQuestion with previews, then dispatches the cho

general

corefall-review

Deep Corefall BP-LEVEL closure review (BP0..BP12) with T-CAPTURE evidence, grading.json LLM-graded verdicts, Self-Play Validation Matrix, AI-Agent Self-Test Report, Universal Enhan

general

do-and-judge

Execute a task with sub-agent implementation and LLM-as-a-judge verification with automatic retry loop

general

do-in-parallel

Launch multiple sub-agents in parallel to execute tasks across files or targets with intelligent model selection, quality-focused prompting, and meta-judge → LLM-as-a-judge verific

general

do-in-steps

Execute complex tasks through sequential sub-agent orchestration with intelligent model selection, meta-judge → LLM-as-a-judge verification — from NeoLabHQ/context-engineering-kit

general

dspy-finetune-bootstrap

This skill should be used when the user asks to "fine-tune a DSPy model", "distill a program into weights", "use BootstrapFinetune", "create a student model", "reduce inference cos

general

firecrawl-scrape

Extract clean markdown from any URL, including JavaScript-rendered SPAs. Use this skill whenever the user provides a URL and wants its content, says "scrape", "grab", "fetch", "pul

engineering

gh:optimize

Run metric-driven iterative optimization loops. Define a measurable goal, build measurement scaffolding, then run parallel experiments that try many approaches, measure e — from wa

science

karvey-benchmark-models

Cross-model benchmark for the Karvey method. Side-by-side comparison of models (e.g. Claude vs GPT vs Gemini) on a skill or task — latency, tokens, cost, and optional LLM-judged qu

tools

lets-optimize

Run metric-driven iterative optimization loops -- define a measurable goal, run parallel experiments, measure each against hard gates or LLM-as-judge scores, keep improvements, and

science

llm-eval-design

**DEFAULT for LLM/agent eval design — dispatches evaluator for AI/LLM-specific evaluation design (offline + online metrics, groundedness, hallucination, drift, cost, latency).**

general

llm-evaluate

Evaluate LLM models for cost/performance ratio. Fetches current pricing and recommends optimal model for your use case. Use during project init or when optimizing costs.

general

llm-output-gate

CI hook that refuses to ship if prompt-eval golden set regresses past threshold or prompt-injection-test fails on HIGH severity

general

llm-router

Selects the optimal LLM model and provider for each task based on complexity, cost budget, and capability requirements. Routes cheap tasks to Haiku/GPT-4o-mini and complex tasks to

general

llm-top-10

Reviews LLM-powered applications against the OWASP Top 10 for Large Language Model Applications (2025 edition). Auto-invoked when reviewing code that integrates LLM APIs, builds RA

security

llms-txt

Generate an llms.txt file for any project or website following the llmstxt.org specification. Use when asked to create llms.txt, generate LLM-friendly documentation, make a project

general

mcp-as-agent

Wrap an MCP server as a yakOS agent so tool-side and LLM-side specialists share the same dispatch surface

general

meeting-note-ingestor

Foundation-portable, source-agnostic transcript ingestor. Consumes a transcript file path (Otter VTT, Word, Zoom, generic LLM-export, or Granola JSON) and emits a structured meetin

content

do-in-steps

Execute complex tasks through sequential sub-agent orchestration with intelligent model selection, meta-judge → LLM-as-a-judge verification — from TuYv/ccpm

engineering

prd-ai-feature

**DEFAULT for PRD FOR AN AI/LLM/AGENT FEATURE — model selection rationale, eval plan, safety boundaries, cost envelope, failure-mode map: PRD covering AI-specific sections (model s

product

serverless-modal

Run GPU workloads on Modal — training, fine-tuning, inference, batch processing. Zero-config serverless: no SSH, no Docker, auto scale-to-zero. Use when user says \"modal run\", \"

engineering

sst-llm-judge-ranker

Maintain a ranked list of N artifacts (drafts, designs, code variants, research reports, ...) by comparing each new candidate against the current top and bottom of the list, using

science

sst-wiki-curator

Build and maintain LLM-curated knowledge wikis for prose domains (legal/regulatory tracking, scientific literature, market intelligence, product taxonomies, personal research notes

science

stack-audit

Audit a repo against the golden-stack canon in llm-wiki-research. Reads the Audit Checklist tables in ideal-tech-setup.md, runs each check against the target repo (file existence,

science

stack-update

Update the golden-stack docs in llm-wiki-research when a tech decision is made. Appends rows to the decision tree, audit checklist, or AI/agent layers; replaces tools; opens the ta

science

model-supply-chain

Reviews AI/ML model supply chains for security risks including model provenance verification, training data lineage, fine-tuning pipeline integrity, inference dependency review, an

security

vault-import

Import an existing Obsidian vault, markdown folder, or git repo as an llm-wiki vault. Moves content into vaults/, adds missing structure (index, log, CLAUDE.md, frontmatter). Use w

general

vllm-serving-setup

Design, deploy, and tune vLLM v0.18.2 inference serving on EKS with PagedAttention v2, Multi-LoRA, FP8 KV Cache, Chunked Prefill, and Continuous Batching. Produces Helm values.yaml

general

wiki-enrich

Fill in the per-paper TODO sections (Problem/Method/Key Results/Limitations/Reusable Ingredients/...) of research-wiki/papers/<slug>.md pages that /research-lit, /arxiv, /alphaxiv,

science

cell_agent

LLM-driven multi-agent framework for automated single-cell analysis.

general

dag-typesafe

Analyze a repository's type system and generate type-safe DAG execution pipelines with GraphSentry-style certificate verification. This skill should be used when building LLM-drive

engineering

llm-classifier

LLM-based zero-shot and few-shot classification for flexible intent detection

general

llmstxt

Generate or audit an `/llms.txt` file at the site root that makes the site legible to LLMs and AI answer engines at inference time, following the llms.txt proposal (Jeremy Howard,

general

semantic-code-analyzer

LLM-powered semantic analysis of code diffs to detect business-logic trojans

general

3d-prompt-engineer

扮演 AI 3D 模型生成提示詞工程師,精通 Meshy、TripoSR、Rodin、Luma Genie、CSM、Zoo 等 text-to-3D / image-to-3D 模型,熟悉 PBR 材質、拓撲、UV、LOD,能產出遊戲與 3D 列印可用資產的提示詞。適用於遊戲資產、3D 列印、AR/VR 場景、產品概念。當使用者描述 3D 模型需求時啟動。

general

63-data-engineering-fine-tuning

Create your LLMOps data engineering skill in one prompt, then learn to improve it throughout the chapter — from panaversity/claude-code-skills-lab

engineering

64-supervised-fine-tuning

Create your llmops-fine-tuner skill from Unsloth documentation before learning fine-tuning theory

general

69-evaluation-quality-gates

Create a reusable skill for evaluating fine-tuned models, benchmarking performance, and detecting quality regressions

general

evolving-ai-agents

Provides guidance for automatically evolving and optimizing AI agents across any domain using LLM-driven evolution algorithms. Use when building self-improving agents, optimizing a

general

a2a-protocol

LiteLLM-RS A2A Protocol Architecture. Covers Agent-to-Agent communication, JSON-RPC 2.0 messaging, multi-provider orchestration, agent registry, and task state management.

engineering

ad-add-fusion-transformation

Claude Code skill (trtllm-agent-toolkit): implement or extend TensorRT-LLM AutoDeploy fusion transforms under transform/library/ in a TensorRT-LLM checkout. Prefer existing kernels

general

add-karpathy-llm-wiki

Add a persistent wiki knowledge base to a NanoClaw group. Based on Karpathy's LLM Wiki pattern. Triggers on "add wiki", "wiki", "knowledge base", "llm wiki", "karpathy wiki".

general

data-addon-langfuse

Langfuse OSS LLM-observability conventions — production traces graduate to the next eval dataset, cross-family LLM judges, versioned reproducible datasets, the MCP at /api/public/m

general

adhx

Fetch any X/Twitter post as clean LLM-friendly JSON. Converts x.com, twitter.com, or adhx.com links into structured data with full article content, author info, and engagement metr

content

advanced-evaluation

Master LLM-as-a-Judge evaluation techniques including direct scoring, pairwise comparison, rubric generation, and bias mitigation. Use when building evaluation systems, comparing m

general

agency-image-prompt-engineer

Expert photography prompt engineer specializing in crafting detailed, evocative prompts for AI image generation. Masters the art of translating visual concepts into preci — from mk

general

agent-architecture-analysis

Perform 12-Factor Agents compliance analysis on any codebase. Use when evaluating agent architecture, reviewing LLM-powered systems, or auditing agentic applications against the 12

engineering

agent-architecture-audit

Full-stack diagnostic for agent and LLM applications. Audits the 12-layer agent stack for wrapper regression, memory pollution, tool discipline failures, hidden repair loops, and r

engineering

agent-evolution

Exposes Hermes self-learning architecture to allow CEO Kit agents to autonomously build new scripts (SKILL.md) and fine-tune their base model weights.

engineering

external-research-prompt-engineer

Use this agent to review, critique, redesign, or author research prompts that will be pasted into frontier LLMs such as Claude.ai Deep Research, Gemini Advanced Deep Research, Perp

science

agent-llm-architect

Expert LLM architect specializing in large language model architecture, deployment, and optimization. Masters LLM system design, fine-tuning strategies, and production se — from ma

engineering

agent-llm-stability

Специализированный скилл для диагностики и исправления зависаний, деградации контекста и нестабильности LLM в агентском режиме (dialogue_node.py + MCP tools). Используй когда: робо

general

agent-ops-context-map

Analyze the codebase to create a concise, LLM-optimized structured overview in .agent/map.md.

general

agent-ops-git-story

Generate narrative summaries from git history for onboarding, retrospectives, changelogs, and exploration. LLM-enhanced when available, works without LLM too.

content

agent-platform-eval-flywheel

Measure and improve the quality of AI models and agents on Google Cloud using the Eval Quality Flywheel methodology. Use when evaluating an agent or model, building an eval dataset

engineering

agent-platform-tuning

Agent Platform Model Tuning. Use when you need to fine-tune open models or Gemini models using Agent Platform infrastructure. Don't use for model training outside Agent Platform, m

engineering

agent-platform-tuning-management

Manages GenAI tuning jobs in Agent Platform. Use this to list, get, or cancel ongoing model tuning jobs. Don't use for fine-tuning models (use `agent-platform-tuning`), deploying m

engineering

agent-prompt-engineer

Expert prompt engineer specializing in designing, optimizing, and managing prompts for large language models. Masters prompt architecture, evaluation frameworks, and production pro

engineering

agent-prompts

Ready-to-use prompt templates for specialized agents. Use when building n8n workflows, AI integrations, or sales materials. Contains structured prompts for automation-architect, ll

sales

agent-r1-end-to-end-rl

Train LLM-based agents with end-to-end RL by extending MDPs to handle tool invocation and environmental stochasticity—enable dense process rewards for intermediate steps and masked

general

agentforce-prompt-versioning

Version Prompt Templates and agent topic prompts: source-control shape, change review, model-version pinning, A/B, and rollback. Trigger keywords: prompt template versioning, promp

general

agentic-eval

Patterns for evaluating and improving AI agent outputs through iterative refinement loops. Use when implementing self-critique, building evaluator-optimizer pipelines, creating rub

general

agentic-grafana-llm-platform-overview

Grafana Labs LLM plugin, Assistant ve HTTP API ayrımını Sentinel CLI bağlamında açıklarken kullan.

engineering

agentic-llm-openai-compatible-local

Ollama, LM Studio, vLLM gibi yerel OpenAI uyumlu /v1 uçları için port, model ve tool desteği fallback’ini yapılandırırken kullan.

general

agentic-llm-openai-compatible-remote

Uzak OpenAI uyumlu API ile base_url, api_key, model ve proxy kullanımını yapılandırırken kullan.

general

agentic-workflow-audit

Use when reviewing or auditing an existing agent / LLM-pipeline architecture — e.g. 'is my workflow actually decomposed or secretly a mega-agent?', 'are my task boundaries and succ

engineering

agents

Patterns and architectures for building AI agents and workflows with LLMs. Use when designing systems that involve tool use, multi-step reasoning, autonomous decision-making, or or

engineering

langchain

Framework for building LLM-powered applications with agents, chains, and RAG. Supports multiple providers (OpenAI, Anthropic, Google), 500+ integrations, ReAct agents, to — from la

engineering

agentv-eval-writer

Write, edit, review, and validate AgentV EVAL.yaml / .eval.yaml evaluation files. Use when asked to create new eval files, update or fix existing ones, add or remove test cases, co

content

agrosat-llm-finetuning

Fine-tune Gemma 4 26B-MoE and Qwen3-VL-30B-A3B with LoRA rank 16 BF16, deploy Qwen3.5-35B-A3B with vLLM on Azure H100 NVL 96GB for AgroSatCopilot. Use when fine-tuning VLMs with Lo

general

ai-agents-meta-orchestrator

Route an AI-agent engineering task to the right skill among 14 meta specialists — planning a multi-session build, decomposing a plan into an agent chain, orchestrating a squad, run

engineering

ai-build-buy-partner

Evaluate AI capability sourcing options across build, buy, fine-tune, and partner archetypes using a structured decision matrix. Use when deciding whether to build a custom model,

general

ai-check

Detect AI/LLM-generated text patterns in research writing. Use when: (1) Reviewing manuscript drafts before submission, (2) Pre-commit validation of documentation, (3) Quality assu

science

ai-eval-design-and-iteration

Develop "quizzes" (evals) to measure model performance on specific tasks. Use these benchmarks to guide fine-tuning, determine product UX patterns, and track performance improvemen

general

ai-expertise-engine

Comprehensive AI/ML expertise covering prompt engineering, LLM architecture, AI agent design, RAG systems, fine-tuning, AI safety, and cutting-edge AI research for building and lev

science

ai-llm

Production LLM engineering skill. Covers strategy selection (prompting vs RAG vs fine-tuning), dataset design, PEFT/LoRA, evaluation workflows, deployment handoff to inference serv

engineering

ai-llm-engineering

Operational skill hub for LLM system architecture, evaluation, deployment, and optimization (modern production standards). Links to specialized skills for prompts, RAG, agents, and

security

ai-llm-inference

Operational patterns for LLM inference: latency budgeting, tail-latency control, caching, batching/scheduling, quantization/compression, parallelism, and reliable serving at scale.

engineering

ai-llm-safety

Enforces safe AI usage practices, prevents prompt injection, and ensures model safety

general

ai-llm-security-review

Use for AI/LLM security assessments, prompt injection, RAG security, agent/tool permissioning, model supply chain, LLM red teaming, AI governance, eval design, data leakage, jailbr

security

ai-llm-skills-guide

Guide for AI Agents and LLM development skills including RAG, multi-agent systems, prompt engineering, memory systems, and context engineering.

general

ai-ml-development

AI and machine learning development with PyTorch, TensorFlow, and LLM integration. Use when building ML models, training pipelines, fine-tuning LLMs, or implementing AI features.

general

ai-ml-model-fine-tuner

Guide pour le fine-tuning de modèles ML/LLM (LoRA, QLoRA, PEFT, datasets, hyperparamètres) — from general/general-misc

general

ai-model-evaluation

Evaluate and compare LLMs, ML APIs, and fine-tuned models for product fit across quality, latency, cost, compliance, and vendor risk dimensions. Use when selecting an AI model or v

general

ai-personalization

Building AI-powered personalization systems: recommendation engines, collaborative filtering, content-based filtering, user preference learning, cold-start solutions, and LLM-enhan

content

ai-portable-setup

Erstellt eine portable KI-Arbeitsumgebung auf einem USB-Stick oder beliebigem Laufwerk. RAG-Pipeline mit lokalen LLM-Modellen (Ollama), Vektordatenbank (ChromaDB) und vorkonfigurie

general

ai-prompt-engineer

AI engineering skill for prompt optimization, context inference, and intelligent command routing across different models and use cases

general

ai-prompt-engineering

Operational prompt engineering for production LLM apps: structured outputs (JSON/schema), deterministic extractors, RAG grounding/citations, tool/agent workflows, prompt safety (in

engineering

ai-prompt-engineering-safety-review

Comprehensive AI prompt engineering safety review and improvement prompt. Analyzes prompts for safety, bias, security vulnerabilities, and effectiveness while providing detailed im

engineering

ai-resume-detector

Pattern recognition for LLM-generated resume text — sentence length variance, em-dash density, and generic accomplishment phrasing

general

ai-systems-architect

Use this when: design an AI system, RAG vs fine-tuning, my agent keeps looping, architect a multi-agent system, which LLM should I use, context window keeps overflowing, add guardr

engineering

aichat-llm-cli-shell-assistant-rag

AIChat is a comprehensive LLM command-line tool written in Rust that combines chat-REPL, shell command generation, RAG, AI tools, and multi-provider support into a single binary. I

tools

aidd-methodology

AI-Driven Development — методология и принципы написания документации для проектов с LLM-агентом. Используй когда: AIDD, AI-driven, планирование проекта, idea.md, vision.md, workfl

general

website-audit

Website Audit mit 230+ Rules für SEO, Performance, Security, Technical und Content Issues. LLM-optimierte Reports mit Health Scores und Handlungsempfehlungen.

security

alrm-agentic-robotic-manipulation

Build agentic LLM-driven robotic manipulation pipelines using the ALRM framework pattern: a ReAct-style reasoning loop with dual execution modes (Code-as-Policy for direct code gen

engineering

analyze-candidates

Ranks candidate skills/agents by task fit using Sonnet LLM-as-judge AND classifies task complexity (model + effort) in same call. Input is union of cheatsheet + FTS5 candidates wit

general

analyze-project-architecture

LLM-based architectural analysis that transforms raw project data into meaningful structure

engineering

prompt-engineer-analyze-prompt

\"Analyze prompts for clarity, effectiveness, and optimization opportunities. 分析提示之清晰度、有效性及優化機會。 Use when: reviewing existing prompts, identifying issues before deployment, generat

engineering

analyzing-codebases

Generates LLM-optimized code context with function call graphs, side effect detection, and incremental updates. Processes JavaScript/TypeScript codebases to create compact semantic

engineering

annotate

Create flexible annotation workflows for AI applications. Contains common tools to explore raw ai agent logs/transcripts, extract out relevant evaluation data, and llm-as-a-judge c

content

anthropic-prompt-engineer

Master Anthropic's prompt engineering techniques to generate new prompts or improve existing ones using best practices for Claude AI models.

general

llm-app-patterns

Production-ready patterns for building LLM applications. Covers RAG pipelines, agent architectures, prompt IDEs, and LLMOps monitoring. Use when designing AI applications — from la

engineering

llm-evaluation

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring — from eng

engineering

prompt-engineering

Expert guide on prompt engineering patterns, best practices, and optimization techniques. Use when user wants to improve prompts, learn prompting strategies, or debug age — from si

general

api-integrator

Integrate external REST and GraphQL APIs with proper authentication (Bearer, Basic, OAuth), error handling, retry logic, and JSON schema validation. Use when making API calls, data

engineering

arize

Instrument agentic LLM apps built on the Claude Agent SDK (claude-agent-sdk) and/or LangGraph with Arize Phoenix and OpenInference — tracing, evaluation, annotations, experiments,

science

aspirations-execute

Phase 4 of the aspirations loop: executes a selected goal end-to-end with precondition checks, LLM-driven intelligent retrieval, memory deliberation, subagent delegation, primary e

general

assemblyai-core-workflow-b

Execute AssemblyAI streaming transcription and LeMUR workflows. Use when implementing real-time speech-to-text, live captions, voice agents, or LLM-powered audio analysis with LeMU

content

audit-website

Audit websites for SEO, technical, content, and security issues using squirrelscan CLI. Returns LLM-optimized reports with health scores, broken links, meta tag analysis, and actio

security

auth-architecture

LiteLLM-RS Authentication Architecture. Covers JWT + API Key + RBAC multi-method auth, rate limiting with DashMap, middleware pipeline, and secure credential management.

engineering

auto-optimize-prompt

Iteratively auto-optimize a prompt until no issues remain. Uses prompt-reviewer in a loop, asks user for ambiguities, applies fixes via prompt-engineering skill. Runs until converg

general

auto-review-loop-llm

Autonomous research review loop using any OpenAI-compatible LLM API. Configure via llm-chat MCP server or environment variables. Trigger with \"auto review loop llm\" or \"llm revi

science

autocontext

Iterative strategy generation and evaluation system. Use when the user wants to evaluate agent output quality, run improvement loops, queue tasks for background evaluation, check r

general

Automated Updates

How the devbox automatically updates llm-agents (claude-code) via GitHub Actions and systemd timers. Use when debugging update failures or understanding the update flow.

engineering

prompt-engineer-automatic-optimization

Automatic prompt optimization via DSPy, OPRO, and evaluation-driven methods. 自動提示優化:DSPy、OPRO 及評估驅動法。 Use when: iterating prompts programmatically, defining optimization metrics, r

general

automem-search

Search, filter, and retrieve Claude/Codex history indexed by the automem CLI. Use when the user wants to index history, run lexical/semantic/hybrid search, fetch full transcripts,

content

axolotl

Expert guidance for fine-tuning LLMs with Axolotl - YAML configs, 100+ models, LoRA/QLoRA, DPO/KTO/ORPO/GRPO, multimodal support — from axolotl-ai-cloud/diff-transformer

general

axolotl

Expert guidance for fine-tuning LLMs with Axolotl - YAML configs, 100+ models, LoRA/QLoRA, DPO/KTO/ORPO/GRPO, multimodal support — from axolotl-ai-cloud/diff-transformer

science

crucible-investigation-methodology

Use when actively investigating a hypothesis — running a sweep, dispatching multi-agent analysis, designing serial adversarial gates, enriching per-trade data for loss postmortem,

science

base-model-selector

Use when starting a fine-tuning project to determine if fine-tuning is needed, or when evaluating whether a base model meets quality thresholds for a specific domain task

general

benchmark-prompt-injection-attacks-defenses-and-recovery-pipelin

Run structured prompt-injection attack and defense experiments against an LLM-integrated app before production by measuring attack success and testing detection or recovery pipelin

security

bmad-checkpoint-preview

LLM-assisted human-in-the-loop review. Make sense of a change, focus attention where it matters, test. Use when the user says "checkpoint", "human review", or "walk me th — from Al

general

bootstrap-llm-synthesis

Construct the LLM synthesis prompt from project surface scan + optional tree-sitter context + optional Q&A answers. Call the LLM. Parse and validate the response into 6-8 structure

general

bootstrap-memory-ingest-cloud

Write structured memory entries to the Cloud workspace via gaai_memory.store MCP tool with source='bootstrap'. Loops over entries from bootstrap-llm-synthesis, calls the tool per e

general

rx9070-vllm-rocm-turboquant

Build latest vLLM from source on this user's WSL + ROCm + RX 9070 setup, then test local GPTQ Qwen/Qwopus models with baseline KV cache and TurboQuant KV cache presets.

general

Build graph RAG context with Neo4j LLM Graph Builder

Convert a bounded document set into a Neo4j knowledge graph, inspect extracted nodes and relationships, and use it for graph-backed RAG.

general

byob

Create custom LLM evaluation benchmarks using the BYOB decorator framework. Use when the user wants to (1) create a new benchmark from a dataset, (2) pick or write a scorer, (3) co

general

caching-architecture

LiteLLM-RS Caching Architecture. Covers Redis caching, vector database semantic caching, multi-tier cache strategy, TTL management, and cache invalidation patterns.

engineering

capx-agentic-robotics

Agentic robotics with CaP-X — LLM-driven robot manipulation via code generation. Use when: (1) Setting up CaP-X / CaP-Gym environments for robot manipulation benchmarks, (2) Runnin

general

cheahjs--free-llm-api

Danh sách free LLM API providers có rate limits cụ thể — OpenRouter, Groq, Cerebras, Google AI Studio, GitHub Models, Fireworks... Reference khi cần backup API.

general

bmad-checkpoint-preview

LLM-assisted human-in-the-loop review. Make sense of a change, focus attention where it matters, test. Use when the user says "checkpoint", "human review", or "walk me th — from va

general

chief-ai-officer-advisor

Chief AI Officer advisory for startups: model build-vs-buy decisions (API vs fine-tune vs in-house), AI risk classification under EU AI Act + US state patchwork, AI cost economics

general

circadian-evolution

Implements a circadian self-improvement cycle for agents: day full agentic grind → evening analysis (detect >60% redundant/saturated data) → night targeted fine-tuning (LoRA/Unslot

general

clip

OpenAI's model connecting vision and language. Enables zero-shot image classification, image-text matching, and cross-modal retrieval. Trained on 400M image-text pairs. U — from op

general

clip

OpenAI's model connecting vision and language. Enables zero-shot image classification, image-text matching, and cross-modal retrieval. Trained on 400M image-text pairs. U — from op

general

cloudflare-kv

Cloudflare Workers KV key-value storage playbook: namespaces, bindings, Workers API (get/put/delete/list), metadata, expiration TTL, bulk operations, REST API, consistency model, c

engineering

cm-dockit

Knowledge systematization engine — analyze codebases, generate Personas, JTBD, Process Flows, technical docs, SOP user guides, API references. Output as Markdown or VitePress Premi

growth

cns-tinker

Apply Chiral Narrative Synthesis (CNS) framework for contradiction detection and multi-source analysis using Tinker API for model training. Use when implementing CNS with Tinker fo

content

prompt-engineering

Universal prompt engineering techniques for any LLM. Use when crafting, optimizing, or reviewing prompts for AI models. Triggers on requests like "improve this prompt", "write a sy

engineering

coderabbit-core-workflow-b

Tune CodeRabbit review configuration: learnings, code guidelines, and noise reduction. Use when fine-tuning review quality, training CodeRabbit with team preferences, adding code g

general

codex-delegate

Delegiert Code-Review, Research-Synthesis, Adversarial-Cross-Check an Codex CLI (`codex exec`, `codex review`). Triggers "delegate to codex", "codex review", "gpt-5.4 check", autom

science

codex-write

Delegate heavy code generation to Codex CLI while Claude orchestrates, reviews, and surgically fixes. Use for any task involving 50+ lines of new code, boilerplate generation, test

tools

cogniforge-llm-model-repair

Diagnose and fix broken LLM model configurations in CogniForge microservices. Use when: AI responses are empty, garbage, or missing LaTeX; reasoning-agent returns empty answers or

science

Mistral AI Automation

Automate Mistral AI operations -- manage files and libraries, upload documents for fine-tuning, batch processing, and OCR, track fine-tuning jobs, and build RAG pipelines — from ph

general

prompt-engineer-compress-context

\"Compress context to fit within token limits while preserving signal. 壓縮語境以適應令牌限制同時保留關鍵信號。 Use when: context approaching window limit, compressing conversation history or RAG docu

general

config-architecture

LiteLLM-RS Configuration Architecture. Covers YAML loading, environment variable override, validation patterns, type-safe config models, and hot reloading.

engineering

configure-hugo

Install and configure llm-wiki-hugo-cms in a wiki repository so it renders as a static Hugo site. Requires Hugo extended ≥ 0.147.0.

general

connect-openrouter

Step-by-step guide to connect OpenRouter API so you can use LLM-powered skills (SERP clustering with cluster naming, PAA question clustering, semantic clustering). Use when user sa

general

content-analysis

Analyze text content using both traditional NLP and LLM-enhanced methods. Extract sentiment, topics, keywords, and insights from various content types including social media posts,

content

content-moderation-patterns

Loaded when user builds content moderation, safety filters, or policy enforcement with Claude. Covers pre-filter vs LLM-classify, category design, confidence thresholds, and human-

content

context7-mcp-documentation-server-llm-code-editors

Context7 by Upstash injects up-to-date, version-specific library documentation and code examples directly into AI prompts. Eliminates hallucinated APIs and outdated code generation

content

contextual-chunking

Contextual Retrieval implementation for RAG - chunks clinical notes with LLM-generated context prepended to each chunk before embedding. Improves citation accuracy by 49% per Anthr

science

convert-dense-pdfs-into-llm-ready-text-and-page-aligned-markdown

Use olmOCR when an agent needs to turn scanned or layout-heavy documents into clean markdown or text before chunking, search, extraction, or citation workflows.

general

coreweave-core-workflow-b

Run distributed GPU training jobs on CoreWeave with multi-node PyTorch. Use when training models across multiple GPUs, setting up distributed training, or running fine-tuning jobs

general

cost-aware-llm-pipeline

Cost optimization patterns for LLM API usage — model routing by task complexity, budget tracking, retry logic, and prompt caching.

engineering

cost-trend

Read every docs/benchmarks/runs/*.json and surface drift in win rate, latency, escalation rate, and LLM-baseline cost over time

general

council-review-subagents

Run a multi-persona LLM-council review of an idea using Claude Code subagents — six personas (Operator, Financier, Skeptic, Visionary, Customer Advocate, Strategist) each review th

general

cpbox-llm-context

USE FOR RAG/LLM grounding. Returns pre-extracted web content (text, tables, code) optimized for LLMs. GET + POST. Adjust max_tokens/count based on complexity... — from Lord1Egypt/R

general

crawl4ai-llm-friendly-web-crawler

Run web crawling and scraping workflows with Crawl4AI, an open-source crawler built to produce LLM-ready markdown and structured extraction output. It supports async crawling, brow

engineering

crawl4ai-llm-web-crawler-scraper

Crawl4AI is an open-source web crawler that converts any website into clean, LLM-ready Markdown for RAG pipelines, AI agents, and data extraction workflows. With 50k+ GitHub stars

general

crawl4ai-open-source-web-crawling-and-markdown-extraction

Crawl4AI is an open source crawler and scraper built for LLM-ready web extraction, with structured markdown output, browser support, and Python package distribution. It has strong

engineering

prompt-engineer-create-system-prompt

\"Guide through creating effective system prompt from scratch using 2026 best practices. 引導以 2026 最佳實踐從零創建高效系統提示。 Use when: starting new AI application, building chatbot or agent s

general

cross-lingual-stability-judges-under

Detect and fix cross-lingual evaluation instabilities in LLM-as-a-judge pipelines. Use when: 'audit my multilingual eval pipeline', 'check if my LLM judge is stable across language

general

cross-vault-link-audit

Audit, fix, and maintain cross-vault links across all vaults in the llm-wiki repo. Use when user wants to check for broken cross-vault links, migrate legacy `[[vault:page]]` wikili

general

fine-tuning-expert

Use when fine-tuning LLMs, training custom models, or adapting foundation models for specific tasks. Invoke for configuring LoRA/QLoRA adapters, preparing JSONL training — from ank

general

prompt-engineer

Writes, refactors, and evaluates prompts for LLMs — generating optimized prompt templates, structured output schemas, evaluation rubrics, and test suites. Use when designing prompt

engineering

ctf-ai-ml

Provides AI and machine learning techniques for CTF challenges. Use when attacking ML models, crafting adversarial examples, performing model extraction, prompt injection, membersh

security

dall-e-prompt-engineering-kit

Structured prompt generation for OpenAI's DALL-E 3 API (images/generations endpoint) with style modifiers, aspect ratio control, and batch variation generation. Includes negative p

engineering

data-llm-app

LLM-app discipline — the three-tier assertion/judge/human eval ladder with no higher tier before the lower, cross-family judges, a single pinned model-version env var, prompt-regre

general

dataset-engineering

Create, clean, and optimize datasets for LLM fine-tuning. Covers formats (Alpaca, ShareGPT, ChatML), synthetic data generation, quality assessment, and augmentation. Use when prepa

general

dataset-evaluation

Validates dataset formatting and quality for SageMaker model fine-tuning (SFT, DPO, or RLVR). Use when the user says "is my dataset okay", "evaluate my data", "check my training da

general

llm-council

Run any question, idea, or decision through a council of 5 AI advisors who independently analyze it, peer-review each other anonymously, and synthesize a final verdict. M — from ni

general

build-deck

Build professional, themed PowerPoint decks from Markdown via LLM-generated pptxgenjs JavaScript. Uses modular per-slide architecture for all decks. Supports data-driven decks with

engineering

defend-llm-prompt-injection

Hardens an LLM feature against prompt injection, jailbreaks, and unsafe output — isolating untrusted content as data, adding input/output guardrails, an injection classifier, PII/s

general

deidentify

De-identify clinical research data before LLM-assisted analysis. Standalone Python CLI detects PHI via regex + heuristics with 10 country locale packs (kr, us, jp, cn, de, uk, fr,

science

delivery-ascii-dashboard

Render data dashboards as pure ASCII art in monospace text -- the cheapest, most portable delivery method. No rendering engine, no SVG, no browser. LLM-native output with predictab

general

design-ai-benchmarking

Design and validity review for studies that benchmark one or more AI systems against a human-expert panel as the reference. Covers the evaluation question and arm definition, decou

general

design-experiment

Plan LLM fine-tuning and evaluation experiments. Use when the user wants to design a new experiment, plan training runs, or create an experiment_summary.yaml file.

science

Image Prompt Engineer

Expert photography prompt engineer specializing in crafting detailed, evocative prompts for AI image generation. Masters the art of translating visual concepts into preci — from st

general