--- name: dump-channel description: "Use when user wants to archive, dump, or back up an entire Telegram channel or chat history to NDJSON with all media files downloaded. Full history extraction with resume support." allowed-tools: Bash, Read, Grep, Glob --- # Dump Telegram Channel History Archive a complete Telegram channel/group/chat to NDJSON + downloaded media files. > **Self-Evolving Skill**: This skill improves through use. If instructions are wrong, parameters drifted, or a workaround was needed — fix this file immediately, don't defer. Only update for real, reproducible issues. ## Preflight 1. Session must exist: `~/.local/share/telethon/.session` - If missing, run `/tlg:setup` first 2. User must be subscribed to (or a member of) the target channel/chat ## Usage ```bash /usr/bin/env bash << 'EOF' SCRIPT="${CLAUDE_PLUGIN_ROOT:-$HOME/.claude/plugins/marketplaces/cc-skills/plugins/tlg}/scripts/tg-cli.py" # Full dump: NDJSON + all media (photos, videos, documents) uv run --python 3.13 "$SCRIPT" dump @ChannelName ./output/ChannelName # NDJSON only (skip media downloads — much faster) uv run --python 3.13 "$SCRIPT" dump @ChannelName ./output/ChannelName --no-media # Dump by numeric chat ID uv run --python 3.13 "$SCRIPT" dump -1001234567890 ./output/MyChannel # Use a different profile uv run --python 3.13 "$SCRIPT" -p missterryli dump @ChannelName ./output/ChannelName EOF ``` ## Parameters | Parameter | Type | Description | | ------------ | ---------- | ---------------------------------------------------------- | | chat | string/int | Channel username (@name) or numeric chat ID | | output | path | Output directory (messages.ndjson + media/ created inside) | | `--no-media` | flag | Skip media downloads, produce NDJSON only | ## Output Structure ``` output/ChannelName/ ├── messages.ndjson ← one JSON object per line, chronological (oldest first) └── media/ ├── 6.jpg ← named by message ID for cross-referencing ├── 12.png ├── 45.mp4 └── ... ``` ## NDJSON Record Schema Each line is a JSON object with these fields: | Field | Type | Description | | ----------------- | ----------- | ----------------------------------------------- | | `id` | int | Telegram message ID | | `date` | string | ISO 8601 timestamp with timezone | | `text` | string/null | Full message text (no truncation) | | `has_media` | bool | Whether message contains media | | `media_type` | string/null | Telethon class name (MessageMediaPhoto, etc.) | | `media_file` | string/null | Filename in media/ dir (e.g., "6.jpg") | | `views` | int/null | View count (channels only) | | `forwards` | int/null | Forward count | | `reply_to_msg_id` | int/null | Parent message ID if reply | | `grouped_id` | int/null | Album group ID (shared across album messages) | | `edit_date` | string/null | ISO 8601 timestamp of last edit | | `sender.id` | int | Sender's Telegram user/channel ID | | `sender.name` | string | Display name (channel title or user first name) | | `sender.username` | string/null | @username if set | ## Resume Support Re-running the same command skips already-downloaded media files (checks `dest.exists()`). The NDJSON is fully rewritten each run. This makes it safe to resume interrupted downloads. ## Querying the Output ```bash # jq: find all GOLD BUY signals with chart screenshots jq 'select(.text != null and (.text | test("GOLD.*BUY")) and .media_file != null)' messages.ndjson # DuckDB: aggregate by date duckdb -c "SELECT date::DATE as day, count(*) FROM read_ndjson('messages.ndjson') GROUP BY day ORDER BY day" # Python/Polars import polars as pl df = pl.read_ndjson("messages.ndjson") ``` ## Performance Notes - ~3000 messages + 1700 media files takes ~3-5 minutes - Telegram may briefly disconnect mid-download (`Server closed the connection`) — Telethon auto-reconnects - For very large channels (10k+ messages), expect 10-15 minutes with media ## Recommended Storage Pattern For git-tracked projects, gitignore the media folder: ```gitignore # data/telegram/.gitignore */media/ ``` This keeps the NDJSON (metadata) in version control while keeping large media files local-only. ## Anti-Patterns - **Don't dump channels you're not subscribed to** — Telethon needs access via your account - **Don't run multiple dumps concurrently on the same profile** — session file contention ## Post-Execution Reflection After this skill completes, check before closing: 1. **Did the command succeed?** — If not, fix the instruction or error table that caused the failure. 2. **Did parameters or output change?** — If tg-cli.py's interface drifted, update Usage examples and Parameters table to match. 3. **Was a workaround needed?** — If you had to improvise (different flags, extra steps), update this SKILL.md so the next invocation doesn't need the same workaround. Only update if the issue is real and reproducible — not speculative.