--- name: mlflow-python description: MLflow experiment tracking via Python API. TRIGGERS - MLflow metrics, log backtest, experiment tracking, search runs. allowed-tools: Read, Bash, Grep, Glob --- # MLflow Python Skill Unified read/write MLflow operations via Python API with QuantStats integration for comprehensive trading metrics. **ADR**: [2025-12-12-mlflow-python-skill](/docs/adr/2025-12-12-mlflow-python-skill.md) > **Note**: This skill uses Pandas (MLflow API requires it). The `mlflow-python` path is auto-skipped by the Polars preference hook. > **Self-Evolving Skill**: This skill improves through use. If instructions are wrong, parameters drifted, or a workaround was needed — fix this file immediately, don't defer. Only update for real, reproducible issues. ## When to Use This Skill **CAN Do**: - Log backtest metrics (Sharpe, max_drawdown, total_return, etc.) - Log experiment parameters (strategy config, timeframes) - Create and manage experiments - Query runs with SQL-like filtering - Calculate 70+ trading metrics via QuantStats - Retrieve metric history (time-series data) **CANNOT Do**: - Direct database access to MLflow backend - Artifact storage management (S3/GCS configuration) - MLflow server administration ## Prerequisites ### Authentication Setup MLflow uses separate environment variables for credentials (NOT embedded in URI): ```bash # Option 1: mise + .env.local (recommended) # Create .env.local in skill directory with: MLFLOW_TRACKING_URI=http://mlflow.eonlabs.com:5000 MLFLOW_TRACKING_USERNAME=eonlabs MLFLOW_TRACKING_PASSWORD= # Option 2: Direct environment variables export MLFLOW_TRACKING_URI="http://mlflow.eonlabs.com:5000" export MLFLOW_TRACKING_USERNAME="eonlabs" export MLFLOW_TRACKING_PASSWORD="" ``` ### Verify Connection ```bash /usr/bin/env bash << 'SKILL_SCRIPT_EOF' cd ${CLAUDE_PLUGIN_ROOT}/skills/mlflow-python uv run scripts/query_experiments.py experiments SKILL_SCRIPT_EOF ``` ## Quick Start Workflows ### A. Log Backtest Results (Primary Use Case) ```bash /usr/bin/env bash << 'SKILL_SCRIPT_EOF_2' cd ${CLAUDE_PLUGIN_ROOT}/skills/mlflow-python uv run scripts/log_backtest.py \ --experiment "crypto-backtests" \ --run-name "btc_momentum_v2" \ --returns path/to/returns.csv \ --params '{"strategy": "momentum", "timeframe": "1h"}' SKILL_SCRIPT_EOF_2 ``` ### B. Search Experiments ```bash uv run scripts/query_experiments.py experiments ``` ### C. Query Runs with Filter ```bash uv run scripts/query_experiments.py runs \ --experiment "crypto-backtests" \ --filter "metrics.sharpe_ratio > 1.5" \ --order-by "metrics.sharpe_ratio DESC" ``` ### D. Create New Experiment ```bash uv run scripts/create_experiment.py \ --name "crypto-backtests-2025" \ --description "Q1 2025 cryptocurrency trading strategy backtests" ``` ### E. Get Metric History ```bash uv run scripts/get_metric_history.py \ --run-id abc123 \ --metrics sharpe_ratio,cumulative_return ``` ## QuantStats Metrics Available The `log_backtest.py` script calculates 70+ metrics via QuantStats, including: | Category | Metrics | | ------------ | ----------------------------------------------------------------- | | **Ratios** | sharpe, sortino, calmar, omega, treynor | | **Returns** | cagr, total_return, avg_return, best, worst | | **Drawdown** | max_drawdown, avg_drawdown, drawdown_days | | **Trade** | win_rate, profit_factor, payoff_ratio, consecutive_wins/losses | | **Risk** | volatility, var, cvar, ulcer_index, serenity_index | | **Advanced** | kelly_criterion, recovery_factor, risk_of_ruin, information_ratio | See [quantstats-metrics.md](./references/quantstats-metrics.md) for full list. ## Bundled Scripts | Script | Purpose | | ----------------------- | -------------------------------------------- | | `log_backtest.py` | Log backtest returns with QuantStats metrics | | `query_experiments.py` | Search experiments and runs (replaces CLI) | | `create_experiment.py` | Create new experiment with metadata | | `get_metric_history.py` | Retrieve metric time-series data | ## Configuration The skill uses mise `[env]` pattern for configuration. See `.mise.toml` for defaults. Create `.env.local` (gitignored) for credentials: ```bash MLFLOW_TRACKING_URI=http://mlflow.eonlabs.com:5000 MLFLOW_TRACKING_USERNAME=eonlabs MLFLOW_TRACKING_PASSWORD= ``` ## Reference Documentation - [Authentication Patterns](./references/authentication.md) - Idiomatic MLflow auth - [QuantStats Metrics](./references/quantstats-metrics.md) - Full list of 70+ metrics - [Query Patterns](./references/query-patterns.md) - DataFrame operations - [Migration from CLI](./references/migration-from-cli.md) - CLI to Python API mapping ## Migration from mlflow-query This skill replaces the CLI-based `mlflow-query` skill. Key differences: | Feature | mlflow-query (old) | mlflow-python (new) | | -------------- | ------------------ | ---------------------- | | Log metrics | Not supported | `mlflow.log_metrics()` | | Log params | Not supported | `mlflow.log_params()` | | Query runs | CLI text parsing | DataFrame output | | Metric history | Workaround only | Native support | | Auth pattern | Embedded in URI | Separate env vars | See [migration-from-cli.md](./references/migration-from-cli.md) for detailed mapping. --- ## Troubleshooting | Issue | Cause | Solution | | ----------------------- | ---------------------------- | --------------------------------------------------- | | Connection refused | MLflow server not running | Verify MLFLOW_TRACKING_URI and server status | | Authentication failed | Wrong credentials | Check MLFLOW_TRACKING_USERNAME and PASSWORD in .env | | Experiment not found | Experiment name typo | Run `query_experiments.py experiments` to list all | | QuantStats import error | Missing dependency | `uv add quantstats` in skill directory | | Pandas import warning | Expected for this skill | Ignore - MLflow requires Pandas (hook-excluded) | | Run creation fails | Experiment doesn't exist | Use `create_experiment.py` to create first | | Metric history empty | Wrong run_id or metric name | Verify run_id with `query_experiments.py runs` | | Returns CSV parse error | Wrong date format or columns | Check CSV has date index and returns column | ## Post-Execution Reflection After this skill completes, check before closing: 1. **Did the command succeed?** — If not, fix the instruction or error table that caused the failure. 2. **Did parameters or output change?** — If the underlying tool's interface drifted, update Usage examples and Parameters table to match. 3. **Was a workaround needed?** — If you had to improvise (different flags, extra steps), update this SKILL.md so the next invocation doesn't need the same workaround. Only update if the issue is real and reproducible — not speculative.