---
name: omni-inference
description: "The core OpenAI-compatible inference endpoints: chat completions, embeddings, images, audio (TTS/STT), moderations, rerank, and the Responses API. The primary integration surface for AI agents."
---
<!-- generated by src/lib/agentSkills/generator.ts; manual edits will be overwritten -->

## Overview

The core OpenAI-compatible inference endpoints: chat completions, embeddings, images, audio (TTS/STT), moderations, rerank, and the Responses API. The primary integration surface for AI agents.

## Authentication

All requests require a valid Bearer token or session cookie. Obtain a token via `POST /api/auth/login` or configure `REQUIRE_API_KEY=false` for local development.

## Endpoints

### POST /api/v1/chat/completions

Create chat completion

OpenAI-compatible chat completions endpoint. Routes to configured providers.

```bash
curl -X POST https://localhost:20128/api/v1/chat/completions \
  -H "Authorization: Bearer $OMNIROUTE_TOKEN"
  -H "Content-Type: application/json" \
  -d '{}'
```

### POST /api/v1/providers/{provider}/chat/completions

Create chat completion (provider-specific)

Routes to a specific provider by name.

```bash
curl -X POST https://localhost:20128/api/v1/providers/{provider}/chat/completions \
  -H "Authorization: Bearer $OMNIROUTE_TOKEN"
  -H "Content-Type: application/json" \
  -d '{}'
```

### POST /api/v1/api/chat

Ollama-compatible chat endpoint

Provides compatibility with Ollama's /api/chat format.

```bash
curl -X POST https://localhost:20128/api/v1/api/chat \
  -H "Authorization: Bearer $OMNIROUTE_TOKEN"
  -H "Content-Type: application/json" \
  -d '{}'
```

### POST /api/v1/messages

Create message (Anthropic-compatible)

Anthropic Messages API endpoint. Routes to Claude providers.

```bash
curl -X POST https://localhost:20128/api/v1/messages \
  -H "Authorization: Bearer $OMNIROUTE_TOKEN"
  -H "Content-Type: application/json" \
  -d '{}'
```

### POST /api/v1/messages/count_tokens

Count tokens for a message

```bash
curl -X POST https://localhost:20128/api/v1/messages/count_tokens \
  -H "Authorization: Bearer $OMNIROUTE_TOKEN"
  -H "Content-Type: application/json" \
  -d '{}'
```

### POST /api/v1/responses

Create response (OpenAI Responses API)

OpenAI Responses API endpoint.

```bash
curl -X POST https://localhost:20128/api/v1/responses \
  -H "Authorization: Bearer $OMNIROUTE_TOKEN"
  -H "Content-Type: application/json" \
  -d '{}'
```

### POST /api/v1/embeddings

Create embeddings

```bash
curl -X POST https://localhost:20128/api/v1/embeddings \
  -H "Authorization: Bearer $OMNIROUTE_TOKEN"
  -H "Content-Type: application/json" \
  -d '{}'
```

### POST /api/v1/providers/{provider}/embeddings

Create embeddings (provider-specific)

```bash
curl -X POST https://localhost:20128/api/v1/providers/{provider}/embeddings \
  -H "Authorization: Bearer $OMNIROUTE_TOKEN"
  -H "Content-Type: application/json" \
  -d '{}'
```

### POST /api/v1/images/generations

Generate images

```bash
curl -X POST https://localhost:20128/api/v1/images/generations \
  -H "Authorization: Bearer $OMNIROUTE_TOKEN"
  -H "Content-Type: application/json" \
  -d '{}'
```

### POST /api/v1/providers/{provider}/images/generations

Generate images (provider-specific)

```bash
curl -X POST https://localhost:20128/api/v1/providers/{provider}/images/generations \
  -H "Authorization: Bearer $OMNIROUTE_TOKEN"
  -H "Content-Type: application/json" \
  -d '{}'
```

### POST /api/v1/audio/speech

Generate speech audio

Text-to-speech endpoint. Routes to configured TTS providers.

```bash
curl -X POST https://localhost:20128/api/v1/audio/speech \
  -H "Authorization: Bearer $OMNIROUTE_TOKEN"
  -H "Content-Type: application/json" \
  -d '{}'
```

### POST /api/v1/audio/transcriptions

Transcribe audio

Audio-to-text transcription endpoint.

```bash
curl -X POST https://localhost:20128/api/v1/audio/transcriptions \
  -H "Authorization: Bearer $OMNIROUTE_TOKEN"
  -H "Content-Type: application/json" \
  -d '{}'
```

### POST /api/v1/moderations

Create moderation

Content moderation endpoint. Routes to configured moderation providers.

```bash
curl -X POST https://localhost:20128/api/v1/moderations \
  -H "Authorization: Bearer $OMNIROUTE_TOKEN"
  -H "Content-Type: application/json" \
  -d '{}'
```

### POST /api/v1/rerank

Rerank documents

Document reranking endpoint.

```bash
curl -X POST https://localhost:20128/api/v1/rerank \
  -H "Authorization: Bearer $OMNIROUTE_TOKEN"
  -H "Content-Type: application/json" \
  -d '{}'
```

### GET /api/v1

API v1 root endpoint

Returns basic API info and status.

```bash
curl https://localhost:20128/api/v1 \
  -H "Authorization: Bearer $OMNIROUTE_TOKEN"
```

### GET /api/v1/providers/{provider}/models

List models for a specific provider

Returns only models for the selected provider with provider prefix removed from each model id.

```bash
curl https://localhost:20128/api/v1/providers/{provider}/models \
  -H "Authorization: Bearer $OMNIROUTE_TOKEN"
```

## Payloads

See the full OpenAPI specification at `GET /api/openapi/spec` or `docs/reference/openapi.yaml` for detailed request/response schemas.

<!-- skill:custom-start -->
<!-- Aggregated from: omniroute-chat, omniroute-image, omniroute-tts, omniroute-stt, omniroute-embeddings, omniroute-web-search, omniroute-web-fetch -->

## Chat completions

Requires `OMNIROUTE_URL` and `OMNIROUTE_KEY`. See [entry-point SKILL](https://raw.githubusercontent.com/diegosouzapw/OmniRoute/main/skills/omniroute/SKILL.md) for setup.

### Endpoints

- `POST $OMNIROUTE_URL/v1/chat/completions` — OpenAI format
- `POST $OMNIROUTE_URL/v1/messages` — Anthropic Messages format
- `POST $OMNIROUTE_URL/v1/responses` — OpenAI Responses API

### Discover

```bash
curl $OMNIROUTE_URL/v1/models | jq '.data[].id'
```

Combos (e.g. `auto`, `cost-optimized`, `subscription`) auto-fallback through multiple providers.

### OpenAI format example

```bash
curl -X POST $OMNIROUTE_URL/v1/chat/completions \
  -H "Authorization: Bearer $OMNIROUTE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4-7",
    "messages": [{"role": "user", "content": "Refactor this function"}],
    "stream": true
  }'
```

### Anthropic format example

```bash
curl -X POST $OMNIROUTE_URL/v1/messages \
  -H "Authorization: Bearer $OMNIROUTE_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4-7",
    "max_tokens": 4096,
    "messages": [{"role": "user", "content": "Hi"}]
  }'
```

### Tool use

Supports OpenAI `tools` array and Anthropic `tools` block. Tool results
auto-compressed via RTK (47 filters: git-diff, grep, test-jest, terraform-plan,
docker-logs, etc.) — 20-40% token savings. Disable per-request with
`X-Omniroute-Rtk: off` header.

### Reasoning / thinking

Anthropic extended thinking and OpenAI Responses reasoning blocks are forwarded
verbatim. Cached automatically via reasoning cache.

### Errors

- `401` → invalid API key
- `400 invalid_model` → model not in registry; check `/v1/models`
- `503 circuit_open` → provider circuit breaker tripped; retry later or use combo
- `429 rate_limited` → honor `Retry-After`; consider using a combo for auto-fallback

## Image generation

Requires `OMNIROUTE_URL` and `OMNIROUTE_KEY`. See [entry-point SKILL](https://raw.githubusercontent.com/diegosouzapw/OmniRoute/main/skills/omniroute/SKILL.md) for setup.

### Endpoints

- `POST $OMNIROUTE_URL/v1/images/generations` — Text-to-image
- `POST $OMNIROUTE_URL/v1/images/edits` — Image edit (mask)
- `POST $OMNIROUTE_URL/v1/images/variations` — Variations

### Discover

```bash
curl $OMNIROUTE_URL/v1/models/image | jq '.data[]'
```

Returns `{ id, owned_by, sizes:[...], capabilities:[...] }` per model.

### Generate example

```bash
curl -X POST $OMNIROUTE_URL/v1/images/generations \
  -H "Authorization: Bearer $OMNIROUTE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "dall-e-3",
    "prompt": "a red bicycle on a wet street, photoreal",
    "n": 1,
    "size": "1024x1024",
    "response_format": "b64_json"
  }'
```

Response: `{ created, data: [{ url? or b64_json, revised_prompt }] }`

### Errors

- `400 invalid_size` → not supported by this model; check `/v1/models/image`
- `400 content_policy_violation` → blocked by provider safety
- `503` → provider unavailable; try another model in `/v1/models/image`

## Text-to-speech

Requires `OMNIROUTE_URL` and `OMNIROUTE_KEY`. See [entry-point SKILL](https://raw.githubusercontent.com/diegosouzapw/OmniRoute/main/skills/omniroute/SKILL.md) for setup.

### Endpoint

- `POST $OMNIROUTE_URL/v1/audio/speech` — returns binary audio (mp3/opus/wav/flac)

### Discover

```bash
curl $OMNIROUTE_URL/v1/models/tts | jq '.data[]'
```

Each entry includes `voices:[...]` for the available voice names per provider.

### Example

```bash
curl -X POST $OMNIROUTE_URL/v1/audio/speech \
  -H "Authorization: Bearer $OMNIROUTE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Hello from OmniRoute.",
    "voice": "alloy",
    "response_format": "mp3"
  }' --output speech.mp3
```

### Voices

Voice names vary by provider. Check `/v1/models/tts` — each entry has `voices:[...]`.
Common OpenAI voices: `alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`.

### Errors

- `400 invalid_voice` → voice not supported by this model
- `400 input_too_long` → input exceeds model character limit
- `503` → provider unavailable; try another model in `/v1/models/tts`

## Speech-to-text

Requires `OMNIROUTE_URL` and `OMNIROUTE_KEY`. See [entry-point SKILL](https://raw.githubusercontent.com/diegosouzapw/OmniRoute/main/skills/omniroute/SKILL.md) for setup.

### Endpoints

- `POST $OMNIROUTE_URL/v1/audio/transcriptions` — multipart upload, returns text
- `POST $OMNIROUTE_URL/v1/audio/translations` — transcribe + translate to English

### Discover

```bash
curl $OMNIROUTE_URL/v1/models/stt | jq '.data[]'
```

### Example

```bash
curl -X POST $OMNIROUTE_URL/v1/audio/transcriptions \
  -H "Authorization: Bearer $OMNIROUTE_KEY" \
  -F "file=@audio.mp3" \
  -F "model=whisper-1" \
  -F "response_format=verbose_json"
```

Response: `{ text, language, duration, segments?:[{ start, end, text }] }`

### Supported formats

Audio: `mp3`, `mp4`, `mpeg`, `mpga`, `m4a`, `wav`, `webm`.
Response formats: `json`, `text`, `srt`, `verbose_json`, `vtt`.

### Errors

- `400 invalid_file_format` → unsupported audio format
- `400 file_too_large` → exceeds provider limit (usually 25MB)
- `503` → provider unavailable; try another model in `/v1/models/stt`

## Embeddings

Requires `OMNIROUTE_URL` and `OMNIROUTE_KEY`. See [entry-point SKILL](https://raw.githubusercontent.com/diegosouzapw/OmniRoute/main/skills/omniroute/SKILL.md) for setup.

### Endpoint

- `POST $OMNIROUTE_URL/v1/embeddings`

### Discover

```bash
curl $OMNIROUTE_URL/v1/models/embedding | jq '.data[]'
```

Each entry: `{ id, owned_by, dimensions, max_input_tokens }`.

### Example

```bash
curl -X POST $OMNIROUTE_URL/v1/embeddings \
  -H "Authorization: Bearer $OMNIROUTE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-large",
    "input": ["first text", "second text"],
    "encoding_format": "float"
  }'
```

Response: `{ data:[{ embedding:[...], index }], usage:{ prompt_tokens, total_tokens } }`

### Batch input

`input` accepts a string or array of strings (up to provider batch limit, typically 2048 items).

### Errors

- `400 input_too_long` → input exceeds `max_input_tokens` for this model
- `400 invalid_encoding_format` → use `float` or `base64`
- `503` → provider unavailable; try another model in `/v1/models/embedding`

## Web search

Requires `OMNIROUTE_URL` and `OMNIROUTE_KEY`. See [entry-point SKILL](https://raw.githubusercontent.com/diegosouzapw/OmniRoute/main/skills/omniroute/SKILL.md) for setup.

### Endpoint

- `POST $OMNIROUTE_URL/v1/web/search` — unified search format

### Discover

```bash
curl $OMNIROUTE_URL/v1/models/web | jq '.data[] | select(.kind == "webSearch")'
```

### Example

```bash
curl -X POST $OMNIROUTE_URL/v1/web/search \
  -H "Authorization: Bearer $OMNIROUTE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tavily/search",
    "query": "OmniRoute github latest release",
    "max_results": 5,
    "include_answer": true
  }'
```

Response: `{ answer?, results:[{ url, title, content, score }] }`

### Parameters

| Field            | Type    | Description                          |
| ---------------- | ------- | ------------------------------------ |
| `model`          | string  | Provider model from `/v1/models/web` |
| `query`          | string  | Search query                         |
| `max_results`    | number  | Max results (default: 5)             |
| `include_answer` | boolean | Include AI-synthesized answer        |
| `search_depth`   | string  | `basic` or `advanced` (Tavily)       |

### Errors

- `400 query_too_long` → shorten the search query
- `503` → provider unavailable; try another model in `/v1/models/web`

## Web fetch

Requires `OMNIROUTE_URL` and `OMNIROUTE_KEY`. See [entry-point SKILL](https://raw.githubusercontent.com/diegosouzapw/OmniRoute/main/skills/omniroute/SKILL.md) for setup.

### Endpoint

- `POST $OMNIROUTE_URL/v1/web/fetch`

### Discover

```bash
curl $OMNIROUTE_URL/v1/models/web | jq '.data[] | select(.kind == "webFetch")'
```

### Example

```bash
curl -X POST $OMNIROUTE_URL/v1/web/fetch \
  -H "Authorization: Bearer $OMNIROUTE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "jina/reader",
    "url": "https://anthropic.com",
    "format": "markdown"
  }'
```

Response: `{ url, title, markdown, links?:[...], images?:[...] }`

### Parameters

| Field    | Type   | Description                                                             |
| -------- | ------ | ----------------------------------------------------------------------- |
| `model`  | string | Provider from `/v1/models/web` (e.g. `jina/reader`, `firecrawl/scrape`) |
| `url`    | string | URL to fetch                                                            |
| `format` | string | `markdown` (default), `html`, `text`                                    |

### Errors

- `400 invalid_url` → URL must be http/https
- `403 blocked` → provider blocked by target site; try a different model
- `503` → provider unavailable; try another model in `/v1/models/web`
<!-- skill:custom-end -->