--- name: mega-djinn description: > Acts as a data intelligence expert to answer complex data questions using Databricks Unity Catalog and Alation metadata. Trigger this skill whenever the user asks for data discovery, table schemas, data lineage, or business definitions. Use it specifically to: 1. Locate tables, views, or notebooks within the Databricks Lakehouse. 2. Retrieve business context, descriptions, or data quality status from the Alation catalog. 3. Explain data transformations and source-to-destination lineage across systems. 4. Clarify technical metadata (e.g., column types, primary keys) or business glossary terms. Trigger even if the user only provides a vague query about "what data we have" or asks "how is this metric calculated?" 5. Present a plan to the user with all the tables and fields used and the actual SQL that will run. Waits for user to accept to run execute the SQL query. 6. Answer a user's natural language query by generating the SQL, executing it and returning the results. --- # Mega Djinn **`CLAUDE.md`** in this repo states short **project invariants** for Claude Code (confirm-before-run, MCP vs SDK, read-only). This file (**`SKILL.md`**) is the **single source of truth** for the full workflow, tools, limits, safety, setup, and domain knowledge. mega-djinn is a natural language → SQL → results path for company data analysts. For every **data question**, follow these four steps: ### Step 1+2 — Parallel context fetch Run `.venv/bin/python scripts/execute.py --query ""` — fetches all three sources **in parallel**: 1. **Databricks Unity Catalog** — table schemas and column definitions (source of truth for structure) 2. **Alation** (catalog search API) — glossary terms, table/column descriptions, lineage snippets 3. **Alation** (published queries API) — approved SQL patterns from the data team **Unity Catalog is only pulled in through `--query`** (and through Databricks MCP tools such as `get_table_details` / SQL against UC). **`.venv/bin/python scripts/execute.py --search` does not call UC** — it hits Alation’s full-text search only (see Tools below). **Alation is optional.** If `ALATION_BASE_URL` or `ALATION_TOKEN` is empty, `scripts/execute.py` will skip all Alation calls and only fetch Unity Catalog context. When this happens, the script prints a warning to stderr. **Tell the user:** "Alation is not configured — I can still query Databricks Unity Catalog, but glossary definitions and approved SQL patterns from Alation won’t be available. Set `ALATION_BASE_URL` and `ALATION_TOKEN` in your `.env` file to enable it." ### Step 3 — Synthesize Combine insights from all three sources: - UC schemas → confirm table names, columns, partition keys - Alation glossary/catalog → validate metric definitions (UPV formula, consent filter, etc.) - Alation approved queries → SQL pattern references ### Step 4 — Generate & Confirm - Generate SQL from the combined context - **Before executing**, clearly explain to the user: - Which tables will be queried - Which fields will be returned - Any filters, date ranges, or joins applied - Show the exact SQL that will be executed - **Ask the user to confirm** they want to execute — do not run the query until confirmed - **After confirmation**, run the query: - **Preferred:** **`mcp__databricks__execute_sql`** (Databricks MCP). - **Fallback:** **`.venv/bin/python scripts/execute.py --sql`** (Databricks SDK) when MCP is unavailable - Summarize results in plain language - **Save every analysis** as an HTML report in `reports/`. Filename: `YYYY-MM-DD_short-slug.html` (e.g. `reports/2026-03-25_YTD-revenue-metrics.html`). The report should include the question, the SQL, and the results table. - **Report layout:** New reports should follow the structure and styling in `reports/templates/index.html` and load `reports/style.css`. Use that file as the default reference for section order, headings, cards, notes, and SQL presentation. Alternate variants remain available in `reports/templates/` for comparison. **Stylesheet path:** dated reports in `reports/` use `` (same folder). The template uses `../style.css` because it sits in a subdirectory — do not copy that path into new reports. - **Note placement:** The Results section contains only the headline interpretation and the data table. Explanatory notes about query mechanics — deduplication logic, what a column means, how aggregation works, why a table was chosen, filter rationale — belong in section 3 (SQL Query), under Query Logic or SQL Planner. Ask: does this explain *what the data means to the reader* (Results) or *how the SQL produced it* (SQL section)? - **Table column widths:** Always use `table-layout: auto` (override the global `fixed`). Before rendering a table, assess each column's data type and expected content range, then size accordingly: - **Fixed-format short values** (dates, IDs, booleans): pin a narrow explicit width + `white-space: nowrap` (e.g. `96px` for dates, `72px` for short numerics). - **Bounded numerics** (counts, percentages, currency): pin to fit the widest expected value + `white-space: nowrap`. - **Free-text / variable-length** (titles, descriptions, names): no fixed width — let the column expand naturally. - **URLs / paths**: no fixed width, but add `overflow: hidden; text-overflow: ellipsis; white-space: nowrap` with a generous `max-width` so very long values truncate gracefully. - Apply overrides via semantic classes in the report's inline `