---
name: databricks-table-documenter
description: Add and update column descriptions and table comments in Unity Catalog. Use when documenting tables, onboarding new datasets, or enforcing metadata standards.
---

# Databricks Table Documenter

Add, update, and verify column-level descriptions and table comments in Unity Catalog.

## When to Use

- Documenting columns on an existing table
- Bulk-describing all columns from a data dictionary
- Adding a table-level comment explaining purpose and ownership
- Auditing tables for missing descriptions

## Prerequisites

Requires `databricks-sdk` and `tenacity`. Run `python3 -c "import databricks.sdk; import tenacity"` to verify. If it fails, stop and ask the user to set up a virtual environment (see `databricks-sdk-foundation` skill). Do NOT install packages on behalf of the user.

For auth and full CLI reference, see `databricks-sdk-foundation`.

## Inspect Current State

Check which columns already have descriptions:

```bash
python3 dbx.py catalog describe <CATALOG>.<SCHEMA>.<TABLE>
```

## Add Table Comment

```bash
python3 dbx.py repl exec <CID> <CTX> 'spark.sql("COMMENT ON TABLE <CATALOG>.<SCHEMA>.<TABLE> IS '\''Purpose: <description>. Owner: <team>. Source: <source>.'\''")' 
```

## Add Column Descriptions

### Single Column

```bash
python3 dbx.py repl exec <CID> <CTX> 'spark.sql("ALTER TABLE <CATALOG>.<SCHEMA>.<TABLE> ALTER COLUMN <col> COMMENT '\''<description>'\''")' 
```

### Bulk Describe (Multiple Columns)

Pass a dict of column names to descriptions:

```bash
python3 dbx.py repl exec <CID> <CTX> '
columns = {
    "user_id": "Unique identifier for the user",
    "created_at": "Timestamp when the record was created (UTC)",
    "revenue": "Total revenue in USD, rounded to 2 decimal places",
    "status": "Current status: active, inactive, or pending",
}
table = "<CATALOG>.<SCHEMA>.<TABLE>"
for col, desc in columns.items():
    safe_desc = desc.replace("\\", "\\\\").replace("\"", "\\\"")
    spark.sql(f"ALTER TABLE {table} ALTER COLUMN {col} COMMENT \"{safe_desc}\"")
    print(f"  {col}: done")
print("All columns documented")
'
```

### Auto-Generate Descriptions from Data Profile

Use column statistics to draft descriptions, then review and apply:

```bash
python3 dbx.py repl exec <CID> <CTX> '
import json
table = "<CATALOG>.<SCHEMA>.<TABLE>"
df = spark.table(table)
for col in df.columns:
    dtype = str(df.schema[col].dataType)
    nullable = df.schema[col].nullable
    distinct = df.select(col).distinct().count()
    nulls = df.filter(df[col].isNull()).count()
    total = df.count()
    print(json.dumps({
        "column": col,
        "type": dtype,
        "nullable": nullable,
        "distinct_values": distinct,
        "null_count": nulls,
        "total_rows": total,
    }))
'
```

Use the output to draft descriptions, then apply with the bulk describe pattern above.

## Audit for Missing Descriptions

```bash
python3 dbx.py repl exec <CID> <CTX> '
table = "<CATALOG>.<SCHEMA>.<TABLE>"
for field in spark.table(table).schema.fields:
    status = "documented" if field.metadata.get("comment") else "MISSING"
    print(f"  {field.name}: {status}")
'
```

## Verify

After documenting, confirm descriptions are set:

```bash
python3 dbx.py catalog describe <CATALOG>.<SCHEMA>.<TABLE>
```

## Conventions

1. Every table should have a `COMMENT ON TABLE` with purpose, owner, and source
2. Every column should have a description explaining what it contains, units, and valid values
3. Use consistent language: start with a noun, include data type context (e.g., "Timestamp in UTC", "Amount in USD")
4. Keep descriptions under 200 characters
5. Re-run audit after schema changes to catch new undocumented columns
