World-class computer vision skill for image/video processing, object detection, segmentation, and visual AI systems.
World-class computer vision skill for image/video processing, object detection, segmentation, and visual AI systems.
World-class ML engineering skill for productionizing ML models, MLOps, and building scalable ML systems.
World-class ML engineering skill for productionizing ML models, MLOps, and building scalable ML systems.
Fast structured generation and serving for LLMs with RadixAttention prefix caching. Use for JSON/regex outputs, constrained decoding, agentic workflows with tool calls, o — from…
sGLang RadixAttention — KV cache stored in a radix tree, reused across requests sharing common prefixes. Cache-aware scheduling (depth-first, LRU at branch level).
Skill library patterns from Voyager — executable code as skills, semantic retrieval, composition, failure-driven refinement.
Build lightweight AI agents with HuggingFace Smolagents — use CodeAgent (writes Python to act) or ToolCallingAgent (JSON tool calls), add built-in or custom Tools, orchestrate…
Tensorflow Savedmodel Creator - Auto-activating skill for ML Deployment. Triggers on: tensorflow savedmodel creator, tensorflow savedmodel creator Part of the ML Deployment skill…
Tensorflow Serving Setup - Auto-activating skill for ML Deployment. Triggers on: tensorflow serving setup, tensorflow serving setup Part of the ML Deployment skill category.
Install Together AI SDK and configure API key for inference and fine-tuning. Use when setting up Together AI, configuring the OpenAI-compatible API, or initializing the together…
Migrate vector data between Pinecone, Qdrant, Weaviate, pgvector with re-embedding and schema mapping.
Server-side extension that completes the full analysis pipeline for image classification after vera-ai-image-testing has run.
vLLM tiered KV cache configuration for production H100/H200 clusters. Native CPU offload, LMCache (CPU+NVMe+GDS), NixlConnector (disaggregated prefill), MooncakeConnector (RDMA),…
Configure vLLM completely — YAML config file format, CLI arg precedence, full VLLM_*/HF_*/TRANSFORMERS_* env-var catalog, end-to-end recipe for air-gapped environments (internal…
vLLM is a fast and memory-efficient inference and serving engine for large language models. It uses PagedAttention for efficient memory management, supports continuous batching,…
This SOP provides a systematic workflow for training and deploying neural networks using Flow Nexus platform with distributed E2B sandboxes.
This skill optimizes prompts for Large Language Models (LLMs) to reduce token usage, lower costs, and improve performance.
This skill optimizes deep learning models using various techniques. It is triggered when the user requests improvements to model performance, such as increasing accuracy, reducing…
Use this skill when building production LLM applications, implementing guardrails, evaluating model outputs, or deciding between prompting and fine-tuning.
This skill allows Claude to evaluate machine learning models using a comprehensive suite of metrics. It should be used when the user requests model performance analysis,…
This skill allows Claude to construct and configure neural network architectures using the neural-network-builder plugin.
Retell AI architecture variants \u2014 AI voice agent and phone call\
TensorFlow best practices for tf.function, GPU memory, and deployment