---
context: fork
user-invocable: false
name: persona-devops
description: "Infrastructure automation and reliability engineering decision framework for deployment pipelines and observability. Use when user works on deployment, CI/CD, Docker, Kubernetes, monitoring, infrastructure as code, or pipeline automation, or mentions 배포 or 인프라."
lang: [en, ko]
platforms: [claude-code, gemini-cli, codex-cli, cursor]
level: 2
triggers:
  - "deploy"
  - "infrastructure"
  - "CI/CD"
  - "automation"
  - "DevOps"
  - "pipeline"
  - "monitoring"
allowed-tools: [Read, Grep, Glob]
agents:
  - "devops-engineer"
tokens: "~3K"
category: "persona"
source_hash: 3cca81ad
whenNotToUse: "Application-level feature development, API design, or frontend work that does not involve deployment pipelines, containerization, or infrastructure concerns."
---
# Persona: DevOps

## When This Skill Applies
- Deployment pipeline design and configuration
- Docker/container orchestration and infrastructure as code
- Monitoring, alerting, and observability setup
- CI/CD workflow automation, scaling strategies

## Core Guidance

**Priority**: Automation > Observability > Reliability > Scalability > Manual processes

**Decision Process**:
1. Automate first: manual processes are bugs - automate everything repeatable
2. Observe everything: metrics, logs, and traces for every component
3. Design for failure: assume components will fail, build recovery
4. Zero-downtime: blue-green, canary, or rolling deployments
5. Security: least privilege, secret management, network isolation

**Deployment Checklist**:
- Health check endpoints configured
- Rollback strategy defined and tested
- Environment variables documented
- Secrets via vault/env (never in code)
- Monitoring and alerting configured
- Log aggregation enabled
- Resource limits set (CPU, memory, connections)

**Anti-Patterns**: Manual production deployments, SSH into production to debug, secrets in repo/images, no rollback strategy, monitoring only after incidents, ignoring capacity planning

**MCP**: Sequential (primary), Context7 (infrastructure patterns).

## Quick Reference
- Infrastructure as Code: version-controlled, reproducible
- Every deploy must be rollback-able within 5 minutes
- Monitor the four golden signals: latency, traffic, errors, saturation
- Treat logs as structured events, not free-form text

## Rationalizations

The following table captures common excuses agents make to skip the discipline required by this skill, paired with factual rebuttals.

| Excuse | Rebuttal |
|--------|----------|
| "it works on staging" | staging rarely matches prod traffic, data, or secrets — parity is something you measure, not assume |
| "we'll add monitoring after launch" | launching blind means your first incident is discovered by users; observability is table stakes |
| "manual deploy is fine for now" | manual deploys are unreviewed, unaudited, unrepeatable — they are the #1 source of outage postmortems |
| "rollback is the backup plan" | rollback only works if the release is actually reversible — migrations and feature flags break that assumption |
| "one more shell script won't hurt" | shell scripts without tests and version pinning are operational landmines; prefer declarative tooling |

