---
name: gemini-api-guides
description: |
  Comprehensive reference for Google's Gemini API. Use when building applications with:
  (1) Gemini models (Gemini 3 Pro, 2.5 Flash/Pro/Flash-Lite) for text and multimodal generation,
  (2) Image generation (Imagen, Nano Banana), video (Veo 3.1), music (Lyria),
  (3) Function calling, structured outputs, and agentic workflows,
  (4) Built-in tools: Google Search, Maps, Code Execution, URL Context, Computer Use, File Search,
  (5) Live API for real-time voice/video streaming,
  (6) Long context (1M+ tokens), embeddings, document/audio/video understanding,
  (7) Batch API, context caching, safety settings.
  Triggers: "gemini api", "google ai", "genai sdk", "gemini model", "veo", "imagen", "nano banana", "lyria", "live api", "vertex ai"
---

# Gemini API Skill

Build AI applications with Google's Gemini models and tools.

## Quick Start

### Installation

```bash
# Python
pip install google-genai

# JavaScript/Node.js
npm install @google/genai

# Go
go get google.golang.org/genai
```

### Environment Setup

```bash
export GEMINI_API_KEY="your-api-key"
```

### Basic Usage

**Python:**
```python
from google import genai

client = genai.Client()
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Your prompt here"
)
print(response.text)
```

**JavaScript:**
```javascript
import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({});
const response = await ai.models.generateContent({
    model: "gemini-2.5-flash",
    contents: "Your prompt here"
});
console.log(response.text);
```

**REST:**
```bash
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{"contents": [{"parts": [{"text": "Your prompt here"}]}]}'
```

## Model Selection

| Model | Best For | Context Window |
|-------|----------|----------------|
| **Gemini 3 Pro** | Most intelligent tasks, multimodal reasoning, agentic | See models-overview |
| **Gemini 2.5 Pro** | Complex reasoning, coding, extended thinking | 1M tokens |
| **Gemini 2.5 Flash** | Balanced performance, general tasks | 1M tokens |
| **Gemini 2.5 Flash-Lite** | High-volume, cost-sensitive, fastest | See models-overview |
| **Imagen** | High-fidelity image generation | N/A |
| **Veo 3.1** | Video generation (8s, 720p/1080p with audio) | N/A |
| **Nano Banana** | Native image gen with Gemini 2.5 Flash | N/A |
| **Nano Banana Pro** | Native image gen with Gemini 3 Pro | N/A |

## Reference Documentation Index

### Getting Started
| Topic | File | Description |
|-------|------|-------------|
| Setup & Libraries | [getting-started.md](references/getting-started.md) | API keys, SDK installation, OpenAI compatibility |

### Models & Pricing
| Topic | File | Description |
|-------|------|-------------|
| Model Overview | [models-overview.md](references/models-overview.md) | All models, capabilities, context windows |
| Pricing | [api-pricing.md](references/api-pricing.md) | Token costs, tool pricing |
| Rate Limits | [rate-limits.md](references/rate-limits.md) | RPM/TPM limits, quotas |
| Gemini 3 Guide | [gemini-3.md](references/gemini-3.md) | Gemini 3 specific features and best practices |
| Imagen | [imagen.md](references/imagen.md) | Image generation with Imagen model |
| Embeddings | [embeddings.md](references/embeddings.md) | Text embeddings for search/RAG |
| Veo | [veo.md](references/veo.md) | Video generation with Veo 3.1 (69K) |
| Lyria | [lyria.md](references/lyria.md) | Music generation with Lyria RealTime |
| Robotics | [robotics.md](references/robotics.md) | Gemini Robotics-ER 1.5 (42K) |

### Core Capabilities
| Topic | File | Description |
|-------|------|-------------|
| Text Generation | [text-generation.md](references/text-generation.md) | Text generation, system instructions (38K) |
| Image Gen (Nano Banana) | [image-generation-gemini.md](references/image-generation-gemini.md) | Native image generation with Gemini (LARGE: 174K) |
| Image Understanding | [image-understanding.md](references/image-understanding.md) | Vision, image analysis |
| Video Understanding | [video-understanding.md](references/video-understanding.md) | Video analysis, timestamps |
| Document Understanding | [document-understanding.md](references/document-understanding.md) | PDF and document processing |
| Speech Generation | [speech-generation.md](references/speech-generation.md) | Text-to-speech (TTS) |
| Audio Understanding | [audio-understanding.md](references/audio-understanding.md) | Audio analysis, transcription |

### Advanced Features
| Topic | File | Description |
|-------|------|-------------|
| Thinking Mode | [thinking.md](references/thinking.md) | Extended reasoning capabilities |
| Thought Signatures | [thought-signatures.md](references/thought-signatures.md) | **EDGE CASE ONLY**: Manual signature handling when NOT using official SDKs |
| Structured Outputs | [structured-outputs.md](references/structured-outputs.md) | JSON schema responses |
| Function Calling | [function-calling.md](references/function-calling.md) | Custom tool integration (54K) |
| Long Context | [long-context.md](references/long-context.md) | 1M+ token handling, context caching |

### Tools
| Topic | File | Description |
|-------|------|-------------|
| Tools Overview | [tools-overview.md](references/tools-overview.md) | Built-in tools summary, agent frameworks |
| Google Search | [google-search.md](references/google-search.md) | Web search grounding |
| Google Maps | [google-maps.md](references/google-maps.md) | Location-aware grounding |
| Code Execution | [code-execution.md](references/code-execution.md) | Python code execution tool |
| URL Context | [url-context.md](references/url-context.md) | URL content extraction |
| Computer Use | [computer-use.md](references/computer-use.md) | Browser automation (preview) (44K) |
| File Search | [file-search.md](references/file-search.md) | RAG with document indexing |

### Live API (Real-time Streaming)
| Topic | File | Description |
|-------|------|-------------|
| Getting Started | [live-api-getting-started.md](references/live-api-getting-started.md) | Low-latency voice/video interactions |
| Capabilities Guide | [live-api-capabilities.md](references/live-api-capabilities.md) | Full capabilities and configurations (32K) |
| Tool Use | [live-api-tools.md](references/live-api-tools.md) | Function calling & Search in Live API |
| Session Management | [live-api-sessions.md](references/live-api-sessions.md) | Session handling, time limits |
| Ephemeral Tokens | [ephemeral-tokens.md](references/ephemeral-tokens.md) | Short-lived auth for client-side WebSockets |

### Guides
| Topic | File | Description |
|-------|------|-------------|
| Batch API | [batch-api.md](references/batch-api.md) | Async processing at 50% cost (47K) |
| Files API | [files-api.md](references/files-api.md) | Upload and manage media files (49K) |
| Context Caching | [context-caching.md](references/context-caching.md) | Implicit & explicit caching for cost savings |
| Media Resolution | [media-resolution.md](references/media-resolution.md) | Control token allocation for media |
| Tokens | [tokens.md](references/tokens.md) | Understand and count tokens |
| Prompt Design | [prompt-design.md](references/prompt-design.md) | Prompt strategies and best practices (47K) |
| Logs & Datasets | [logs-datasets.md](references/logs-datasets.md) | Enable logging, view in AI Studio |
| Data Logging & Sharing | [data-logging-sharing.md](references/data-logging-sharing.md) | Storage and management of API logs |
| Safety Settings | [safety-settings.md](references/safety-settings.md) | Adjust safety filters |
| Safety Guidance | [safety-guidance.md](references/safety-guidance.md) | Best practices for safe AI use |

### Troubleshooting & Migration
| Topic | File | Description |
|-------|------|-------------|
| Troubleshooting | [troubleshooting.md](references/troubleshooting.md) | Diagnose and resolve common API issues (25K) |
| Vertex AI Comparison | [vertex-ai-comparison.md](references/vertex-ai-comparison.md) | **READ ONLY IF USER MENTIONS "VERTEX AI"**: Gemini Developer API vs Vertex AI differences |

## Large Files - Search Patterns

For large reference files (>30K), use grep to find specific sections:

**image-generation-gemini.md (174K):**
```bash
grep -n "## " references/image-generation-gemini.md  # List sections
grep -n "edit" references/image-generation-gemini.md  # Find editing info
grep -n "style" references/image-generation-gemini.md  # Find style transfer
```

**veo.md (69K):**
```bash
grep -n "## " references/veo.md  # List sections
grep -n "audio" references/veo.md  # Find audio generation info
```

**models-overview.md (67K):**
```bash
grep -n "gemini-3" references/models-overview.md
grep -n "context" references/models-overview.md
```

**function-calling.md (54K):**
```bash
grep -n "## " references/function-calling.md
grep -n "parallel" references/function-calling.md  # Parallel function calls
```

## Common Patterns

### Multimodal Input (Image + Text)
```python
from google import genai
from google.genai import types

client = genai.Client()
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=[
        types.Part.from_image(image_path),
        types.Part.from_text("Describe this image")
    ]
)
```

### Function Calling
```python
tools = [
    types.Tool(function_declarations=[{
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
            "type": "object",
            "properties": {"location": {"type": "string"}},
            "required": ["location"]
        }
    }])
]

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What's the weather in Paris?",
    config=types.GenerateContentConfig(tools=tools)
)
```

### Google Search Grounding
```python
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What are the latest AI developments?",
    config=types.GenerateContentConfig(
        tools=[types.Tool(google_search=types.GoogleSearch())]
    )
)
```

### Thinking Mode
```python
response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents="Solve this complex problem...",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(thinking_budget_tokens=10000)
    )
)
```

### Streaming
```python
for chunk in client.models.generate_content_stream(
    model="gemini-2.5-flash",
    contents="Write a story"
):
    print(chunk.text, end="")
```

## Key Concepts

### Tool Execution Flow

**Built-in tools** (Google Search, Code Execution): Executed by Google
1. Send prompt with tool config → Model executes tool → Response with grounded results

**Custom tools** (Function Calling): You execute
1. Send prompt with function declarations → Model returns function call JSON
2. You execute function, send result back → Model generates final response

### Thought Signatures (Important)
- **If using official SDKs with chat feature**: Thought signatures are handled automatically. No action needed.
- **If manually managing conversation history**: Read [thought-signatures.md](references/thought-signatures.md) for Gemini 3 Pro function calling requirements.

## API Endpoints

| Endpoint | Purpose |
|----------|---------|
| `/v1beta/models/{model}:generateContent` | Standard generation |
| `/v1beta/models/{model}:streamGenerateContent` | Streaming |
| `/v1beta/models/{model}:embedContent` | Embeddings |
| `/v1beta/models/{model}:countTokens` | Token counting |

Base URL: `https://generativelanguage.googleapis.com`