--- namespace: aiwg name: acquire platforms: [all] description: Download media from discovered sources with format selection and progress tracking commandHint: argumentHint: --plan | --url [--format audio|video|best] [--output ] [--parallel N] allowedTools: Bash, Read, Write, Glob, Grep model: sonnet category: media-curator --- # /acquire Download media from discovered sources with intelligent format selection, parallel execution, and comprehensive progress tracking. ## Purpose The `/acquire` command orchestrates media downloads from various sources (YouTube, Internet Archive, Bandcamp, direct links) using the Acquisition Manager agent. It handles: - Single URL or batch downloads from source plans - Automatic format selection based on content type - Parallel download execution with resource limits - Real-time progress tracking and status reporting - Error recovery with retry strategies - Quality verification and metadata recording - Network mount awareness for optimal performance ## Parameters ### Required (one of) **`--plan `** - Source plan file generated by `/find-sources` command - YAML format containing URLs, metadata, and acquisition strategy - Example: `.curator/sources/plan-001.yaml` **`--url `** - Single URL for one-off downloads - Supports: YouTube, Internet Archive, Bandcamp, SoundCloud, Vimeo, direct links - Example: `https://youtube.com/watch?v=abc123` ### Optional **`--format `** - Format preference for downloads - `audio`: Extract audio only (Opus 128K default) - `video`: Best video up to 1080p with audio - `best`: Let agent decide based on content type (default) - Example: `--format audio` **`--output `** - Base output directory for downloads - Default: `./downloads/` - Structure: `///audio/` and `/video/` - Example: `--output /mnt/archive/music` **`--parallel `** - Maximum concurrent downloads - Range: 1-5 (default: 3) - Lower for slow networks, higher for fast connections - Example: `--parallel 5` **`--local-work `** - Local working directory (required for network mount outputs) - Downloads happen here first, then copied to output - Default: `/tmp/curator-work-$$` - Example: `--local-work /fast-local-disk/tmp` **`--verify-after`** - Enable post-download verification - Checks file integrity, size, media validity - Adds ~5% overhead but prevents corruption - Example: `--verify-after` **`--extract-audio`** - Automatically extract audio from downloaded videos - Creates parallel audio files in audio/ directory - Uses Opus 128K by default - Example: `--extract-audio` **`--resume `** - Resume interrupted download session - Loads state from `.curator/sessions//state.json` - Skips completed downloads, retries failed - Example: `--resume 20260214-143022` ## Usage Examples ### Single URL Download ```bash # Download single YouTube video (audio only) /acquire --url "https://youtube.com/watch?v=abc123" --format audio # Download single video with metadata /acquire --url "https://youtube.com/watch?v=abc123" \ --format video \ --output /mnt/archive/concerts \ --verify-after ``` ### Batch Download from Plan ```bash # Download entire source plan (generated by /find-sources) /acquire --plan .curator/sources/plan-001.yaml # Download to network mount (uses local working dir) /acquire --plan .curator/sources/plan-001.yaml \ --output /mnt/network-storage/archive \ --local-work /fast-ssd/curator-work \ --parallel 3 ``` ### Audio Extraction Workflow ```bash # Download videos and auto-extract audio /acquire --plan .curator/sources/plan-001.yaml \ --format video \ --extract-audio \ --verify-after # Result structure: # downloads/artist/era/video/concert.mkv # downloads/artist/era/audio/concert.opus ``` ### Resume Interrupted Session ```bash # Resume after network failure /acquire --resume 20260214-143022 # Resume with different output (move completed files) /acquire --resume 20260214-143022 \ --output /new/location ``` ## Workflow ### 1. Initialization ```bash # Validate parameters - Check --plan exists OR --url provided - Verify output directory is writable - Check required tools (yt-dlp, wget, curl, ffmpeg) - Test network mount performance if applicable - Create session directory structure # Session setup SESSION_ID=$(date +%Y%m%d-%H%M%S) SESSION_DIR=".curator/sessions/$SESSION_ID" mkdir -p "$SESSION_DIR"/{logs,metadata} # Initialize state file cat > "$SESSION_DIR/state.json" </dev/null 2>&1; then echo "VALID: $url" else echo "WARNING: Cannot access $url" fi done <<< "$SOURCES" fi # If --url provided, validate single URL if [[ -n "$SINGLE_URL" ]]; then if ! yt-dlp --dump-json "$SINGLE_URL" >/dev/null 2>&1; then echo "ERROR: Cannot access URL: $SINGLE_URL" exit 1 fi fi ``` ### 3. Directory Structure Creation ```bash create_acquisition_structure() { local output_base="$1" local artist="$2" local era="$3" # Sanitize directory names local safe_artist=$(echo "$artist" | sed 's/[^a-zA-Z0-9_-]/_/g') local safe_era=$(echo "$era" | sed 's/[^a-zA-Z0-9_-]/_/g') # Create directory tree local audio_dir="$output_base/$safe_artist/$safe_era/audio" local video_dir="$output_base/$safe_artist/$safe_era/video" mkdir -p "$audio_dir/.curator" mkdir -p "$video_dir/.curator" mkdir -p "$output_base/$safe_artist/.curator" # Write artist metadata cat > "$output_base/$safe_artist/.curator/artist-info.json" <> "$SESSION_DIR/completion.log" ) & local pid=$! ACTIVE_DOWNLOADS+=("$download_id:$pid") # Update state file update_state_file "add" "$download_id" "$url" "in_progress" } download_with_retry() { local url="$1" local target_dir="$2" local format="$3" local download_id="$4" local attempt=0 local max_attempts=3 while [[ $attempt -lt $max_attempts ]]; do attempt=$((attempt + 1)) echo "[$download_id] Attempt $attempt/$max_attempts" if yt-dlp -f "$format" \ --output "$target_dir/%(title)s.%(ext)s" \ --write-info-json \ --write-thumbnail \ --no-playlist \ "$url" \ 2>&1 | tee "$SESSION_DIR/logs/$download_id.log"; then update_state_file "complete" "$download_id" "" "completed" return 0 fi # Check error type local error_type=$(classify_error "$SESSION_DIR/logs/$download_id.log") if [[ "$error_type" == "disk_full" || "$error_type" == "permission_denied" ]]; then echo "FATAL: $error_type - aborting all downloads" pkill -P $$ # Kill all child processes exit 2 fi if [[ $attempt -lt $max_attempts ]]; then local wait_time=$((5 * attempt)) echo "Waiting ${wait_time}s before retry..." sleep "$wait_time" fi done update_state_file "fail" "$download_id" "" "failed" return 1 } check_completed_downloads() { local new_active=() for entry in "${ACTIVE_DOWNLOADS[@]}"; do local pid="${entry#*:}" if kill -0 "$pid" 2>/dev/null; then new_active+=("$entry") fi done ACTIVE_DOWNLOADS=("${new_active[@]}") } ``` ### 5. Track Progress ```bash # Real-time progress monitoring monitor_progress() { local session_dir="$1" while true; do # Load current state local state_file="$session_dir/state.json" local total=$(jq '.downloads | length' "$state_file") local completed=$(jq '[.downloads[] | select(.status == "completed")] | length' "$state_file") local in_progress=$(jq '[.downloads[] | select(.status == "in_progress")] | length' "$state_file") local failed=$(jq '[.downloads[] | select(.status == "failed")] | length' "$state_file") # Clear screen and display status clear cat < $audio_file" ffmpeg -i "$video_file" \ -vn \ -acodec libopus \ -b:a 128K \ "$audio_file" \ 2>&1 | tee -a "$SESSION_DIR/logs/audio-extraction.log" if [[ ${PIPESTATUS[0]} -eq 0 ]]; then echo "SUCCESS: $audio_file" else echo "FAILED: $video_file" fi done done fi ``` ### 7. Verify Downloads (if requested) ```bash if [[ "$VERIFY_AFTER" == "true" ]]; then echo "Verifying downloaded files..." local verification_results="$SESSION_DIR/verification.json" echo '{"verified": [], "failed": []}' > "$verification_results" find "$OUTPUT_DIR" -type f $ -name "*.opus" -o -name "*.mkv" -o -name "*.mp4" -o -name "*.flac" $ | while read -r file; do echo "Verifying: $file" # Check file size local size=$(stat -c%s "$file" 2>/dev/null || stat -f%z "$file") if [[ $size -eq 0 ]]; then echo "FAILED: Zero-byte file" jq ".failed += [\"$file\"]" "$verification_results" > "$verification_results.tmp" mv "$verification_results.tmp" "$verification_results" continue fi # Check media integrity with ffmpeg if ffmpeg -v error -i "$file" -f null - 2>&1 | grep -q "error"; then echo "FAILED: Corrupted media file" jq ".failed += [\"$file\"]" "$verification_results" > "$verification_results.tmp" mv "$verification_results.tmp" "$verification_results" continue fi echo "VERIFIED: $file ($size bytes)" jq ".verified += [\"$file\"]" "$verification_results" > "$verification_results.tmp" mv "$verification_results.tmp" "$verification_results" done local verified_count=$(jq '.verified | length' "$verification_results") local failed_count=$(jq '.failed | length' "$verification_results") echo "" echo "Verification complete: $verified_count verified, $failed_count failed" fi ``` ### 8. Copy to Network Mount (if applicable) ```bash if [[ -n "$LOCAL_WORK" && "$OUTPUT_DIR" != "$LOCAL_WORK"* ]]; then echo "Copying from local work directory to network mount..." # Batch copy with progress rsync -av --progress \ --exclude='.DS_Store' \ --exclude='Thumbs.db' \ "$LOCAL_WORK/" "$OUTPUT_DIR/" # Verify checksums echo "Verifying network copy..." (cd "$LOCAL_WORK" && find . -type f -exec sha256sum {} \; | sort) > /tmp/local-checksums (cd "$OUTPUT_DIR" && find . -type f -exec sha256sum {} \; | sort) > /tmp/remote-checksums if diff /tmp/local-checksums /tmp/remote-checksums; then echo "Verification SUCCESS - removing local working copy" rm -rf "$LOCAL_WORK" else echo "WARNING: Checksum mismatch - keeping local copy at $LOCAL_WORK" fi fi ``` ## Progress Display Format Real-time console output during acquisition: ``` ACQUISITION PROGRESS ==================== Session: 20260214-143022 Started: 2026-02-14T14:30:22Z Status: 5/12 completed (1 failed, 3 in progress) Active Downloads: [67%] concert-1.mkv @ 12.5MB/s (ETA: 245s) [23%] album.flac @ 8.2MB/s (ETA: 680s) [91%] podcast-ep5.opus @ 2.1MB/s (ETA: 45s) Recent Completions: [✓] documentary.mp4 (1.2GB, 8m 34s) [✓] live-set.opus (85MB, 1m 12s) Recent Failures: [✗] unavailable-video.mkv (Video unavailable - marked for alternate source) Total Downloaded: 4.8GB / ~12.5GB estimated Average Speed: 8.9MB/s Estimated Completion: 14:58 (28 minutes remaining) ``` ## Error Handling ### Common Errors and Responses | Error | Detection | Response | User Action Required | |-------|-----------|----------|---------------------| | **URL inaccessible** | Pre-download validation fails | Skip URL, mark for manual review | Check URL, update source plan | | **Network timeout** | Download stalls for 60s | Retry with exponential backoff (3x) | Check network connection | | **Rate limited** | HTTP 429 response | Wait 60s, retry (2x) | Reduce parallel count | | **Disk full** | Write fails with ENOSPC | STOP all downloads, escalate | Free disk space | | **Format unavailable** | yt-dlp format error | Try fallback formats | Accept lower quality or skip | | **Corrupted download** | ffmpeg verification fails | Delete and retry (2x) | Report to source | | **Permission denied** | Write fails with EACCES | STOP, escalate | Fix permissions | | **Mount failure** | Network mount timeout | Fall back to local-only mode | Check mount health | ### Error Log Structure ```bash # Per-download error log: .curator/sessions//logs/.log [2026-02-14 14:35:22] Starting download: https://youtube.com/watch?v=abc123 [2026-02-14 14:35:23] Format selected: bestvideo[height<=1080]+bestaudio [2026-02-14 14:37:45] ERROR: HTTP Error 429: Too Many Requests [2026-02-14 14:37:45] Classified as: rate_limited [2026-02-14 14:37:45] Applying retry strategy: wait 60s (attempt 1/2) [2026-02-14 14:38:45] Retrying download... [2026-02-14 14:42:10] Download completed: concert-1.mkv (1.2GB) ``` ### Session Failure Recovery ```bash # Detect stale sessions and offer recovery detect_incomplete_sessions() { find .curator/sessions -name "state.json" -mtime -7 | while read -r state_file; do local status=$(jq -r '.status' "$state_file") local session_id=$(jq -r '.session_id' "$state_file") if [[ "$status" == "in_progress" ]]; then local completed=$(jq '[.downloads[] | select(.status == "completed")] | length' "$state_file") local total=$(jq '.downloads | length' "$state_file") echo "Incomplete session detected: $session_id ($completed/$total completed)" echo "Resume with: /acquire --resume $session_id" fi done } ``` ## Report Generation ### Final Session Report ```bash generate_acquisition_report() { local session_dir="$1" local state_file="$session_dir/state.json" local report_file="$session_dir/report.md" cat > "$report_file" < "$export_file" echo "Metadata exported: $export_file" } ``` ## Integration with Other Commands ### Typical Workflow ```bash # 1. Discover sources /find-sources --artist "Pink Floyd" --era "1970s" --sources youtube,archive # 2. Acquire media from discovered sources /acquire --plan .curator/sources/plan-001.yaml \ --format video \ --extract-audio \ --verify-after \ --output /mnt/archive/pink-floyd # 3. Extract metadata (runs automatically or manually) /extract-metadata --source /mnt/archive/pink-floyd/The_Wall/audio # 4. Organize final collection /organize --source /mnt/archive/pink-floyd ``` ## Performance Considerations ### Network Mount Optimization - **Download to local first**: Always use `--local-work` when output is network mount - **Batch copy**: Single rsync after all downloads complete - **Verify checksums**: Before deleting local copy - **Concurrent limit**: Reduce `--parallel` for network mounts (recommend 2) ### Disk Space Management - **Pre-flight check**: Estimate total size from source plan - **Monitor during**: Watch for disk space warnings - **Cleanup strategy**: Remove verified local copies after network copy ### Network Bandwidth - **Parallel tuning**: More concurrent downloads for high bandwidth - **Rate limiting**: Add `--limit-rate` to yt-dlp for shared networks - **Peak hours**: Schedule large downloads for off-peak times ## Security Considerations - **URL validation**: Never execute arbitrary commands from URLs - **Directory traversal**: Sanitize all path components - **Disk quota**: Respect user/system quotas - **Network access**: Only connect to explicitly allowed domains - **Credential handling**: Never log or expose API keys/tokens ## References - @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/human-authorization.md — Seek explicit authorization before irreversible actions (overwrites, deletions) - @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/research-before-decision.md — Research sources before deciding on acquisition strategy - @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/find-sources/SKILL.md — Source discovery skill used before acquisition - @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/integrity-verification/SKILL.md — Verify downloaded files after acquisition - @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/provenance-tracking/SKILL.md — Track origin and derivation of acquired media