---
name: databricks-job-runner
description: Run Databricks jobs and notebooks using dbx.py -- submit, wait for completion, get output, and manage runs. Covers both one-off runs and existing job triggers.
---

# Databricks Job Runner

Submit notebook runs, trigger existing jobs, wait for completion, and retrieve outputs.

## When to Use

- Running notebooks that take > 5 minutes
- Submitting parameterized notebook runs
- Triggering existing scheduled jobs
- Any work needing an audit trail (run history)

For quick interactive work (< 5 min), use `databricks-interactive-repl` instead.

## Prerequisites

Requires `databricks-sdk` and `tenacity`. Run `python3 -c "import databricks.sdk; import tenacity"` to verify. If it fails, stop and ask the user to set up a virtual environment (see `databricks-sdk-foundation` skill). Do NOT install packages on behalf of the user.

For auth and full CLI reference, see `databricks-sdk-foundation`.

## List Existing Jobs

```bash
python3 dbx.py jobs list
```

### Filter by Name

```bash
python3 dbx.py jobs list --name "etl"
```

## Run an Existing Job

Triggers a job and waits for completion:

```bash
python3 dbx.py jobs run <JOB_ID> --wait
```

### With Parameter Overrides

```bash
python3 dbx.py jobs run <JOB_ID> --params key=value,key2=value2 --wait
```

## Submit a One-Off Notebook Run

Creates a one-time run (no persistent job) and waits:

```bash
python3 dbx.py jobs submit --cluster <CLUSTER_ID> --notebook "/Users/<EMAIL>/<path>" --wait
```

### With Parameters and Timeout

```bash
python3 dbx.py jobs submit --cluster <CLUSTER_ID> --notebook "/Users/<EMAIL>/<path>" --params key=value --wait --timeout 3600
```

## Check Run Status

```bash
python3 dbx.py jobs status <RUN_ID>
```

Returns: `{"run_id": ..., "state": "...", "result_state": "..."}`

Terminal states:

| life_cycle_state | result_state | Meaning |
|-----------------|--------------|---------|
| TERMINATED | SUCCESS | Completed successfully |
| TERMINATED | FAILED | Failed -- check output for errors |
| TERMINATED | CANCELLED | Cancelled by user or timeout |
| SKIPPED | -- | Skipped (dependency not met) |

## Get Run Output

```bash
python3 dbx.py jobs output <RUN_ID>
```

## Cancel a Run

```bash
python3 dbx.py jobs cancel <RUN_ID>
```

## Cluster Management

Before submitting to a cluster, ensure it is running:

```bash
python3 dbx.py clusters ensure <CLUSTER_ID>
```

See `databricks-cluster-manager` for discovery and lifecycle management.

## REPL vs Job Submission

| Criteria | Interactive REPL | Job Submission |
|----------|-----------------|----------------|
| Expected runtime | < 5 min | > 5 min |
| Audit trail needed | No | Yes |
| Parameterized | No | Yes (--params) |
| Has markdown docs | No | Yes (notebook cells) |
| Iterative exploration | Yes | No |
