---
name: clade-prod-checklist
description: |
  Production readiness checklist for Claude-powered applications —
  Use when working with prod-checklist patterns.
  error handling, monitoring, fallbacks, cost controls, and security.
  Trigger with "anthropic production", "claude production ready",
  "anthropic launch checklist", "go live with claude".
allowed-tools: Read, Write, Edit
version: 1.0.0
license: MIT
author: Jeremy Longshore <jeremy@intentsolutions.io>
compatible-with: claude-code
tags: [saas, anthropic, claude, production, checklist]
---

# Anthropic Production Checklist

## Overview
Before going live with a Claude-powered app, verify every item below.

## Authentication & Security
- [ ] API key stored in secrets manager (not in code or env file on disk)
- [ ] Key rotated — not the same one used during development
- [ ] Server-side only — no key exposed to client/browser
- [ ] Per-user rate limiting in place
- [ ] Input validation: max length, content filtering
- [ ] System prompt includes injection guardrails

## Output
- All checklist items verified (authentication, error handling, streaming, cost, monitoring, reliability, content, performance)
- Production API key configured with appropriate spending limits
- Monitoring and alerting in place
- Fallback behavior tested for API outages

## Error Handling
- [ ] All Anthropic API calls wrapped in try/catch
- [ ] `RateLimitError` (429) → backoff and retry
- [ ] `OverloadedError` (529) → fallback model or queue
- [ ] `AuthenticationError` (401) → alert team, don't retry
- [ ] `InvalidRequestError` (400) → log and fix, don't retry
- [ ] Network errors → retry with backoff
- [ ] Request IDs logged for every error (for support tickets)

## Streaming
- [ ] Using `client.messages.stream()` for user-facing responses
- [ ] Stream errors handled (connection drops, incomplete responses)
- [ ] `stop_reason` checked: `end_turn` vs `max_tokens` (incomplete)

## Cost Controls
- [ ] `max_tokens` set to realistic values (not 4096 for short answers)
- [ ] Correct model for each task (Haiku for simple, Sonnet for balanced)
- [ ] Prompt caching enabled for repeated system prompts
- [ ] Usage logging in place — tracking tokens and cost per request
- [ ] Spending alerts set in Anthropic console

## Monitoring
- [ ] Response latency tracked (TTFT and total)
- [ ] Token usage tracked (input/output per request)
- [ ] Error rates dashboarded (by error type)
- [ ] Anthropic status page monitored ([status.anthropic.com](https://status.anthropic.com))

## Reliability
- [ ] SDK `maxRetries` set (default 2 is fine for most)
- [ ] Timeout configured for your use case (`timeout` option)
- [ ] Single client instance reused (not created per request)
- [ ] Graceful degradation if Claude is down (cached responses, fallback)

## Content & Compliance
- [ ] System prompt tested against edge cases and adversarial inputs
- [ ] Output validated before showing to users (JSON parsing, length)
- [ ] Data retention settings configured in Anthropic console
- [ ] No unnecessary PII in prompts
- [ ] Usage policy compliance (Anthropic's Acceptable Use Policy)

## Performance
- [ ] p95 latency acceptable for your UX
- [ ] Prompt caching for latency-sensitive paths
- [ ] Parallel requests where possible (`Promise.all`)
- [ ] Client-side streaming UI implemented

## Examples
Each section above is a verifiable checklist. Work through Authentication & Security, Error Handling, Streaming, Cost Controls, Monitoring, Reliability, Content & Compliance, and Performance sections.

## Resources
- [API Best Practices](https://docs.anthropic.com/en/docs/build-with-claude)
- [Error Handling](https://docs.anthropic.com/en/api/errors)
- [Rate Limits](https://docs.anthropic.com/en/api/rate-limits)
- [Acceptable Use Policy](https://www.anthropic.com/policies/aup)

## Next Steps
See `clade-observability` for monitoring setup.

## Prerequisites
- All other anthropic skills reviewed
- Application feature-complete and tested locally
- Production API key created (separate from dev)
- Deployment platform selected

## Instructions

### Step 1: Review the patterns below
Each section contains production-ready code examples. Copy and adapt them to your use case.

### Step 2: Apply to your codebase
Integrate the patterns that match your requirements. Test each change individually.

### Step 3: Verify
Run your test suite to confirm the integration works correctly.
