---
name: gemini-live-bootstrap
description: >
  Use this skill to bootstrap an ADK 2.0 streaming agent using the Gemini
  Live API. Triggers on: "ADK Gemini Live", "streaming agent ADK",
  "real-time ADK agent", "ADK live API", "voice agent ADK", "WebSocket
  agent ADK", "ADK streaming bootstrap". Generates a streaming agent with
  the Live model variant, a runner that yields tokens as they arrive, and
  a minimal client wired over WebSocket / SSE.
---

# gemini-live-bootstrap

Set up an ADK 2.0 agent on the Gemini Live API for real-time streaming responses (text, audio, vision).

## When to use

- Voice assistants where users expect immediate audio response
- Live transcription / translation flows
- Real-time multimodal experiences (camera + voice)
- Replacing turn-based chat with continuous interaction

## Server-side template

```python
# server.py
from google.adk.agents import LlmAgent
from google.adk.runners import Runner
from google.adk.streaming import StreamingMode

root_agent = LlmAgent(
    name="live_assistant",
    model="gemini-2.5-flash-live",   # Live variant
    instruction="Respond conversationally. Be concise; the user is speaking with you.",
    streaming_mode=StreamingMode.BIDI,
)

runner = Runner(agent=root_agent)
```

## FastAPI WebSocket endpoint

```python
from fastapi import FastAPI, WebSocket

app = FastAPI()

@app.websocket("/live")
async def live_endpoint(ws: WebSocket):
    await ws.accept()
    session = await runner.create_session(user_id="u1")

    async def producer():
        async for event in runner.run_live(session=session):
            await ws.send_json({"type": event.type, "data": event.serialize()})

    async def consumer():
        while True:
            msg = await ws.receive_json()
            await runner.send_live_input(session=session, input=msg)

    import asyncio
    await asyncio.gather(producer(), consumer())
```

## Client (browser)

```javascript
const ws = new WebSocket("ws://localhost:8000/live");
ws.onmessage = (e) => {
  const evt = JSON.parse(e.data);
  if (evt.type === "text_delta") appendText(evt.data.text);
  if (evt.type === "audio_chunk") playAudio(evt.data.bytes);
};
ws.send(JSON.stringify({ type: "user_text", text: "Hello!" }));
```

## Response modalities

```python
LlmAgent(
    ...,
    response_modalities=["TEXT", "AUDIO"],   # or just ["AUDIO"]
    voice_name="Aoede",                       # Gemini Live voice options
)
```

## Validation

- WebSocket connection upgrades cleanly
- First text delta arrives < 500ms after user input
- Audio chunks decode without artifacts (test in browser)
- Session persists across reconnects (use a persistent SessionService)

## See also

- `audio-streaming-agent` for voice-first patterns
- `vision-streaming-agent` for camera input
- `bidirectional-tool-streaming` for tools that run mid-stream