--- name: dynamo-sglang-bump description: Bump Dynamo's SGLang backend to a new SGLang version. Sets up a clean dynamo main checkout, target SGLang tag, fresh venv, then walks every launch script and fixes API breakage as it surfaces. Use when the user asks to upgrade/bump SGLang in Dynamo (e.g. "update dynamo for sglang 0.5.X"). user-invocable: true --- # Dynamo SGLang Version Bump End-to-end recipe for upgrading the `dynamo.sglang` backend to a new SGLang release. The upgrade is **bottom-up empirical**: spin up the canonical environment, then run each launch script in `examples/backends/sglang/launch/` and fix breakage as it appears. Don't try to predict the diff from release notes — let the scripts surface real failures. ## Canonical Paths | Path | Purpose | |------|---------| | `/ephemeral/dynamo` | Dynamo checkout. **Always start on `main` and pull.** | | `/ephemeral/sglang` | SGLang upstream checkout. **Checkout the target tag (`vX.Y.Z`).** | | `~/aiperf` | Benchmark client (rarely needed for a bump). | If these paths don't exist on the box, ask the user before cloning to a new location — they may have a different layout. ## Step 1: Confirm Inputs Before touching anything, confirm with the user: - **Target SGLang version** (e.g. `0.5.9`, mapped to tag `v0.5.9`). - **Runtime container tags.** Always ask — these don't always match the pip version 1:1 (`.post1` suffixes, RC tags, CUDA variants). Need: - The CUDA-default runtime tag, e.g. `v0.5.11-runtime` - The CU130 runtime tag, e.g. `v0.5.11-cu130-runtime` Both go into `container/context.yaml`, `container/rendered.Dockerfile`, `container/compliance/README.md`, and any `container/templates/sglang_runtime.Dockerfile` references. **Never guess.** Ask explicitly: "What `lmsysorg/sglang` runtime image tags should I bump to for the default and cu130 variants?" **Verify the tags exist on Docker Hub before applying** — `docker buildx imagetools inspect lmsysorg/sglang:v-runtime` is the cheapest check. SGLang ships *base* images and *runtime* images via two **separate** GitHub Actions workflows ("Release Docker Images" and "Release Docker Runtime Images"), and the runtime workflow has historically failed/been delayed for some releases (e.g. v0.5.11's runtime build initially died on apt-mirror flake during `apt-get update`). If the runtime tag doesn't exist yet: - Ask the user whether to (a) wait for upstream to re-run the runtime workflow (`gh run list --repo sgl-project/sglang --workflow "Release Docker Runtime Images"`), (b) push to a maintainer with permission to dispatch it (`workflow_dispatch` accepts a `version` input matching `X.Y.Z`), or (c) **defer the container tag bump to a follow-up PR** and ship the pip bump alone. Option (c) matches the precedent at commit `26af597bf85` ("chore: SGLang base image refresh ...") and is the right call when there's no eta on the runtime images. - Whichever path the user picks, mention it explicitly in the bump PR body so reviewers don't think the container files were forgotten. - **Branch name**. Default convention: `idhanani/sgl-to--and-cleanups` (matches the `0.5.9` precedent on `ishan/sgl-to-0.5.9-and-cleanups`). - **Linear ticket** if there is one, for the PR description. ## Step 2: Reset Dynamo to main ```bash cd /ephemeral/dynamo git status # must be clean; if not, ask the user git checkout main git pull --ff-only origin main git checkout -b ``` Refuse to proceed if the working tree is dirty — those are the user's in-progress changes. ## Step 3: Checkout SGLang at Target Tag ```bash cd /ephemeral/sglang git fetch origin --tags git checkout v # e.g. v0.5.9 git status # verify detached HEAD at the tag ``` If the tag is missing, run `git fetch origin --tags --force` and retry. Don't silently fall back to a different ref. ## Step 4: Fresh venv A fresh venv is **not optional** for this skill — old SGLang artifacts in site-packages cause confusing import-time errors that look like Dynamo bugs. ```bash cd /ephemeral/dynamo deactivate 2>/dev/null || true rm -rf .venv-sgl- # name it after the target version uv venv .venv-sgl- --python 3.12 source .venv-sgl-/bin/activate ``` Verify isolation: ```bash which python python -c "import sglang" 2>&1 | head -1 # expected: ModuleNotFoundError ``` ## Step 5: Install SGLang (Local Editable) Install from the `/ephemeral/sglang` checkout, not from PyPI — this lets you grep into the live SGLang source while debugging. **Always include `[diffusion]`** — image/video/dllm launch scripts pull in `diffusers`, `imageio`, `imageio-ffmpeg`, `moviepy` etc. via that extra. Without it you'll bounce off `ModuleNotFoundError: imageio` / `diffusers` two scripts in. ```bash cd /ephemeral/sglang && uv pip install -e "python[diffusion]" python -c "import sglang; print(sglang.__version__)" ``` The reported version must match the target. If pip resolves a different one, look for a `requirements*.txt` constraint pinning it elsewhere. ## Step 6: Build Dynamo Bindings + Install A fresh venv has neither `maturin` nor the `nixl` Python bindings — install both first or `maturin develop` will `command not found` and dynamo workers will fail at import with `ImportError: NIXL Python bindings must be installed`. ```bash uv pip install maturin nixl cd /ephemeral/dynamo/lib/bindings/python && maturin develop --uv cd /ephemeral/dynamo && uv pip install -e . ``` Sanity check the rebuilt Rust exports — the `kvstats` symbol on `dynamo.prometheus_names` was missing during the 0.5.9 bump until the bindings were rebuilt: ```bash python -c "from dynamo.prometheus_names import kvstats; print('ok')" ``` ## Step 7: Known Environment Gotchas Encode these as preflight env vars for the test session: - **CuDNN mismatch** (`SGLANG_DISABLE_CUDNN_CHECK=1`) PyTorch ships an older CuDNN than newer SGLang requires for Conv3d (vision/multimodal). Set this before launching `agg_vision.sh` and any multimodal script. Required for 0.5.9; still required at 0.5.11. - **Local model cache**: confirm `HF_HOME` / `HF_HUB_CACHE` point to a fast disk (`/ephemeral/cache` on this box) so tests don't redownload weights. - **Verify HF_TOKEN before launch.** Anonymous HF requests get 429-rate-limited fast, and gated models (`black-forest-labs/FLUX.1-dev`, anything with a license click-through) refuse outright. `hf auth whoami` must succeed; if it errors with "Invalid user token" the env's `HF_TOKEN` is stale and the user has to provide a fresh one before image_diffusion / multimodal scripts will work. - **Pre-download the heavy / gated models.** Letting the launch script trigger the download is fragile under HF rate limits — a half-completed download will error mid-init with a confusing 429 traceback. Pre-fetch with `hf download --token "$HF_TOKEN"` first. Big offenders: FLUX.1-dev (~25 GB, gated), LLaDA2.0-mini-preview (~35 GB), Wan2.1-T2V-1.3B-Diffusers. ## Step 8: Walk Every Launch Script **Read first** before launching anything: - `components/src/dynamo/sglang/CLAUDE.md` — SGLang Backwards Compatibility policy and component layout. - `components/src/dynamo/sglang/_compat.py` — current shim. The existing fallback comments tell you what N-1 is *today*; that's the version about to age out. Path: `/ephemeral/dynamo/examples/backends/sglang/launch/` Run them in roughly this order — simpler first, multi-modal/diffusion last: ```text agg.sh agg_embed.sh agg_router.sh agg_vision.sh disagg.sh disagg_router.sh # needs >=4 GPUs; SKIP otherwise disagg_same_gpu.sh # ask the user; often skip diffusion_llada.sh image_diffusion.sh text-to-video-diffusion.sh multimodal_epd.sh multimodal_disagg.sh # needs >=3 GPUs; SKIP otherwise ``` For each script: 1. Note `nvidia-smi` GPU count vs. script's GPU need. Skip with a recorded reason if short. 2. `pkill -9 -f sglang; pkill -9 -f dynamo; pkill -9 -f sgl_diffusion; sleep 3` before launch. Diffusion workers spawn an `sgl_diffusion::scheduler` child process that survives `pkill -f sglang`; explicitly grep for it. After diffusion runs, `nvidia-smi` may still show ~30 GB used by an orphan — kill it by PID. 3. Tee output: `bash launch/