---
name: databricks-table-explorer
description: Explore Databricks Unity Catalog tables -- list, describe, preview data, and profile columns. Uses dbx.py for metadata and the interactive REPL for data queries.
---

# Databricks Table Explorer

Discover and inspect tables in Unity Catalog. Uses `dbx.py` for metadata (no cluster needed) and the interactive REPL for data queries.

## When to Use

- Finding tables by name or pattern
- Checking table schema and metadata
- Previewing data or computing statistics
- Profiling columns (distinct counts, nulls, distributions)

## Prerequisites

Requires `databricks-sdk` and `tenacity`. Run `python3 -c "import databricks.sdk; import tenacity"` to verify. If it fails, stop and ask the user to set up a virtual environment (see `databricks-sdk-foundation` skill). Do NOT install packages on behalf of the user.

For auth and full CLI reference, see `databricks-sdk-foundation`.

## Metadata (No Cluster Needed)

### List Catalogs

```bash
python3 dbx.py catalog list-catalogs
```

### List Schemas

```bash
python3 dbx.py catalog list-schemas <CATALOG>
```

### List Tables

```bash
python3 dbx.py catalog list-tables <CATALOG> <SCHEMA>
```

### Describe a Table

```bash
python3 dbx.py catalog describe <CATALOG>.<SCHEMA>.<TABLE>
```

Returns: full_name, table_type, storage_location, created_at, updated_at, and column metadata.

### Search for Tables by Pattern

Server-side pattern matching (SQL LIKE syntax with `%` and `_`):

```bash
python3 dbx.py catalog search <CATALOG> --pattern "%<PATTERN>%"
python3 dbx.py catalog search <CATALOG> --pattern "%<PATTERN>%" --schema <SCHEMA>
```

## Data Queries (Needs Running Cluster)

For data preview and profiling, use `databricks-interactive-repl` to create a context, then execute:

```bash
python3 dbx.py repl create <CLUSTER_ID>
```

### Preview Data

```bash
python3 dbx.py repl exec <CID> <CTX> 'print(spark.sql("SELECT * FROM <TABLE> LIMIT 20").toPandas().to_markdown())'
```

### Row Count

```bash
python3 dbx.py repl exec <CID> <CTX> 'print(spark.sql("SELECT COUNT(*) FROM <TABLE>").collect()[0][0])'
```

### Column Statistics

```bash
python3 dbx.py repl exec <CID> <CTX> 'print(spark.table("<TABLE>").describe().toPandas().to_markdown())'
```

### Distinct Value Counts

```bash
python3 dbx.py repl exec <CID> <CTX> 'from pyspark.sql import functions as F; df=spark.table("<TABLE>"); print(df.select([F.countDistinct(c).alias(c) for c in df.columns]).toPandas().to_markdown())'
```

### Null Counts

```bash
python3 dbx.py repl exec <CID> <CTX> 'from pyspark.sql import functions as F; df=spark.table("<TABLE>"); print(df.select([F.sum(F.col(c).isNull().cast("int")).alias(c) for c in df.columns]).toPandas().to_markdown())'
```

### Value Distribution

```bash
python3 dbx.py repl exec <CID> <CTX> 'print(spark.sql("SELECT <COLUMN>, COUNT(*) as cnt FROM <TABLE> GROUP BY <COLUMN> ORDER BY cnt DESC LIMIT 20").toPandas().to_markdown())'
```

## When to Use dbx.py vs REPL

| Operation | Method | Reason |
|-----------|--------|--------|
| List catalogs/schemas/tables | `dbx.py catalog` | Fast, no cluster needed |
| Describe table schema | `dbx.py catalog describe` | Metadata only |
| Search tables by pattern | `dbx.py catalog search` | Server-side filtering |
| Preview rows | `dbx.py repl exec` | Reads actual data |
| Row counts | `dbx.py repl exec` | Requires query execution |
| Column profiling | `dbx.py repl exec` | Requires computation |
| Aggregations / joins | `dbx.py repl exec` | Requires Spark |
