--- name: bio-virtual-screening description: Performs structure-based virtual screening using AutoDock Vina, SMINA, GNINA (CNN scoring), and DiffDock-L hybrid workflows with explicit choice rules across rigid vs flexible docking, cross-docking vs self-docking, binding-site detection (P2Rank, fpocket), receptor preparation (PDB2PQR, PROPKA), ligand preparation (meeko, OpenBabel), and ultralarge-library screening (ZINC22, Enamine REAL). Use when screening chemical libraries against a protein target to find candidate binders, ranking docking poses, or selecting a docking workflow for a specific scenario. tool_type: python primary_tool: AutoDock Vina --- ## Version Compatibility Reference examples tested with: AutoDock Vina 1.2.5+, SMINA 2020-12+, GNINA 1.1+, RDKit 2024.09+, meeko 0.5+, P2Rank 2.4+, ProDy 2.4+, pdb2pqr 3.6+. Before using code patterns, verify installed versions match. If versions differ: - Python: `pip show ` then `help(module.function)` to check signatures - CLI: `vina --version`; `gnina --version`; `smina --version` If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying. # Virtual Screening Screen chemical libraries against protein targets via molecular docking. Vina is the de-facto default, SMINA adds flexibility (Vinardo scoring, custom scoring), and GNINA adds CNN-based pose scoring (Top-1 redock 58%->73% over Vina, cross-dock 27%->37%). Deep-learning docking (DiffDock-L, EquiBind, NeuralPLexer) competes in pose accuracy but fails physical-plausibility tests (PoseBusters) ~50% of the time; the postdoc-grade workflow combines ML pose sampling with classical scoring and physical validation. For ultralarge libraries (>1M), library preparation, hierarchical filtering, and HPC orchestration become the limiting steps. For pose physical-validity QC, see `chemoinformatics/pose-validation`. For ML-driven docking + rescoring, see `chemoinformatics/ml-docking-rescoring`. For covalent docking, see `chemoinformatics/covalent-design`. For affinity calculations (FEP), see `chemoinformatics/free-energy-calculations`. ## Docking Tool Taxonomy | Tool | Scoring | Speed (sec/lig) | Best at | Fails when | |------|---------|-----------------|---------|------------| | AutoDock Vina 1.2 | Vina (empirical) | 5-30 | Default, well-validated | Cross-dock; cryptic pockets; metal centers | | SMINA | Vina + flexible + custom | 5-30 | Custom scoring; flexible side chains | Same Vina-scoring caveats | | Vinardo | Vinardo (improved Vina) | 5-30 | Better scoring than Vina on DUD-E | Limited adoption | | GNINA 1.1 | CNN (default) or Vina | 30-120 | Pose ranking, redocking | Slower; GPU recommended | | AutoDock 4 | AD4 + grid maps | 30-60 | Legacy reference | Slower than Vina | | DOCK 6/7 | DOCK + Amber | 60-300 | UCSF DOCK ecosystem | Steep learning curve | | Glide (Schrodinger) | GlideScore | 5-30 | Commercial SOTA | License cost | | GOLD (CCDC) | GOLDScore / ChemScore | 60-180 | Commercial; metal coordination | License cost | | FlexX (BioSolveIT) | FlexX | 30-60 | Fragment-based | License cost | | rDock | rDock | 30-60 | Open-source alternative | Slower than Vina | | DiffDock-L | Diffusion-generative | 5 (GPU) | Pose sampling for cross-docking | High PB-invalid rate (~50%); see ml-docking-rescoring | | EquiBind | Equivariant NN | <1 (GPU) | Single-shot pose | Lowest accuracy in PoseBuster benchmarks | | Boltz-2 + GNINA rescore | Foundation model + CNN | 10 (GPU) | Modern SOTA hybrid | High GPU; not all proteins | **Decision:** For most screens, **GNINA 1.1** with CNN scoring is the modern default (better than Vina on every benchmark; 30s/ligand on GPU). For >1M library scale, hierarchical Vina -> GNINA -> MM/GBSA. For cross-docking (predicted target structure or apo-holo), GNINA's CNN scoring transfers better than Vina. ## Decision Tree by Scenario | Scenario | Recommended workflow | |----------|---------------------| | Self-dock against known ligand pocket | GNINA `gnina --cnn_scoring rescore` | | Cross-dock to apo or related-target structure | DiffDock-L pose + GNINA rescore + PoseBusters | | Ultralarge library (10M+) | Vina hierarchical: pre-filter (Lipinski, PAINS) -> Vina dock -> top 1% GNINA rescore -> top 0.1% MM/GBSA | | Cryptic pocket / induced fit | Ensemble docking (10 conformer snapshots) + AlphaFold3 or Boltz-1 holo prediction | | Allosteric / undefined site | P2Rank for pocket detection -> ensemble dock all pockets | | Metal-coordinated ligand | GOLD (commercial) or manually parameterize Vina metal scoring | | Covalent inhibitor | See `chemoinformatics/covalent-design`: DOCKovalent, HCovDock | | Fragment screen (<300 Da) | rDock or constrained Vina with seed atoms | | Hit-to-lead refinement | Use co-crystal structure if available; MD-relaxed receptor; FEP for affinity | ## Receptor Preparation **Goal:** Convert a protein PDB into a docking-ready format with correct protonation, missing atoms, and removed waters. **Approach:** Strip ligands and waters -> fill missing atoms (PROPKA or pdbfixer) -> add hydrogens at pH 7.4 (PDB2PQR / Reduce) -> assign partial charges -> convert to PDBQT (Open Babel). ```python import subprocess def prepare_receptor(pdb_in, pdbqt_out, pH=7.4, remove_het=True): base = pdb_in.replace('.pdb', '') fixed = f'{base}_fixed.pdb' propka_pdb = f'{base}_propka.pdb' if remove_het: with open(pdb_in) as fin, open(f'{base}_clean.pdb', 'w') as fout: for line in fin: if line.startswith('HETATM') and 'HOH' in line: continue if line.startswith('HETATM'): continue fout.write(line) subprocess.run(['pdb2pqr', '--ff=AMBER', f'--with-ph={pH}', f'{base}_clean.pdb', propka_pdb], check=True) subprocess.run(['obabel', propka_pdb, '-O', pdbqt_out, '-xr', '--partialcharge', 'gasteiger'], check=True) return pdbqt_out ``` **Common pitfall:** Forgetting to add hydrogens at protein pH (7.4) but using pH 7.0 ligand charges. Hist mistakenly protonated. Use PROPKA + manual review of catalytic residues. ## Ligand Preparation **Goal:** Generate a 3D, docking-ready ligand file from SMILES with appropriate protonation and conformation. **Approach:** Protonate at pH 7.4 with `Chem.MolFromSmiles` then `rdMolStandardize.Uncharger` -> embed 3D with ETKDGv3 -> minimize with MMFF94 -> assign Gasteiger charges -> write PDBQT with meeko. ```python from rdkit import Chem from rdkit.Chem import AllChem from rdkit.Chem.MolStandardize import rdMolStandardize from meeko import MoleculePreparation, PDBQTWriterLegacy def prepare_ligand(smiles, pdbqt_out): mol = Chem.MolFromSmiles(smiles) uncharger = rdMolStandardize.Uncharger() mol = uncharger.uncharge(mol) mol = Chem.AddHs(mol) AllChem.EmbedMolecule(mol, AllChem.ETKDGv3()) AllChem.MMFFOptimizeMolecule(mol) # meeko 0.5+ API: prepare() returns a list of MoleculeSetup objects; # use PDBQTWriterLegacy.write_string() to materialize the PDBQT block. mk_prep = MoleculePreparation() setups = mk_prep.prepare(mol) pdbqt_text, is_ok, err = PDBQTWriterLegacy.write_string(setups[0]) if not is_ok: raise RuntimeError(f'meeko PDBQT export failed: {err}') with open(pdbqt_out, 'w') as f: f.write(pdbqt_text) return pdbqt_out ``` `meeko` (AutoDock developers' tool) handles torsion tree creation, rotamer flagging, and PDBQT writing -- preferred over Open Babel's PDBQT writer. Note: meeko 0.5+ separated the writer (`PDBQTWriterLegacy`) from `MoleculePreparation`; older code using `prep.write_pdbqt_file()` is deprecated. ## Binding Site Detection When the binding pocket is not known (apo target, novel allosteric site): | Tool | Approach | Output | |------|----------|--------| | P2Rank (Krivak 2018) | ML on protein surface descriptors | Ranked pocket list with center coords | | fpocket (Le Guilloux 2009) | Voronoi tessellation | Pocket descriptor list | | DoGSiteScorer | Geometric + drugability | Pocket list with score | | AutoSite (Vina) | Affinity map clustering | Pocket centers | | AlphaFill | Co-fold known ligand | Plausible binding site | ```bash prank predict -f receptor.pdb -o pockets/ ``` P2Rank output `_predictions.csv` lists pocket centers with scores. Pocket 1 (highest score) is typically the orthosteric site for known protein families. ## Vina Docking (Single Ligand) ```python # AutoDock Vina Python API requires Vina 1.2+; for Vina 1.1 use subprocess CLI: # subprocess.run(['vina', '--receptor', ..., '--ligand', ..., '--center_x', ...], check=True) from vina import Vina def dock_single(receptor_pdbqt, ligand_pdbqt, center, box_size, exhaustiveness=8, n_poses=10): v = Vina(sf_name='vina') v.set_receptor(receptor_pdbqt) v.set_ligand_from_file(ligand_pdbqt) v.compute_vina_maps(center=center, box_size=box_size) v.dock(exhaustiveness=exhaustiveness, n_poses=n_poses) return v.energies(), v.poses() ``` **Exhaustiveness:** - 8: default, ~30s/ligand, acceptable for screening - 16: ~60s/ligand, lead-like prioritization - 32: ~2 min/ligand, top-hit re-docking - 64: ~4 min/ligand, production / co-crystal comparison Vina pose RMSD lower bound `rmsd_lb` and upper bound `rmsd_ub` are NOT pose-vs-reference RMSDs; they are bounds on the conformational space sampled. Use external RMSD to a reference pose for accuracy QC. ## GNINA with CNN Scoring (modern default) ```bash gnina -r receptor.pdb -l ligand.sdf \ --autobox_ligand reference_ligand.sdf \ --cnn_scoring rescore \ -o poses.sdf.gz \ --num_modes 9 --exhaustiveness 8 ``` `--cnn_scoring`: - `rescore` (default): Vina sampling + CNN scoring (best validated) - `refinement`: CNN refines top Vina pose - `metrorefine`: Metropolis-style CNN refinement (slower, ~10% better pose accuracy) - `none`: Vina-only (use SMINA equivalently) `--autobox_ligand`: define box from reference ligand SDF/PDB. Otherwise specify `--center_x/y/z` + `--size_x/y/z`. **Critical:** GNINA's CNN was trained on PDBbind 2019; performance on novel chemotypes outside this distribution is reduced. Validate with a known co-crystal redock if possible. ## Virtual Screening Pipeline (Hierarchical) **Goal:** Screen 10M-compound library down to top-1k candidates for follow-up. **Approach:** Three-stage filter: Pseudo-code skeleton (orchestrator). Each helper function delegates to a dedicated skill: drug-likeness filter to `chemoinformatics/admet-prediction`, single-ligand Vina/GNINA to `dock_single` defined earlier in this skill, PoseBusters QC to `chemoinformatics/pose-validation`. ```python import pandas as pd from concurrent.futures import ProcessPoolExecutor # Stub helpers to be implemented per project; see the cross-referenced skills. def drug_like_filter(df): raise NotImplementedError('Implement via chemoinformatics/admet-prediction (Lipinski+Veber+PAINS)') def vina_dock(smi, receptor_pdbqt, center, box): raise NotImplementedError('Wrap dock_single() above; return best affinity') def gnina_rescore(smi, receptor_pdbqt, center, box): raise NotImplementedError('Wrap gnina --cnn_scoring rescore subprocess call') def pose_validate(df): raise NotImplementedError('Implement via chemoinformatics/pose-validation (PoseBusters)') def vs_pipeline(library_smi, receptor_pdbqt, center, box, output_dir, n_workers=16): df = pd.read_csv(library_smi) df_stage1 = drug_like_filter(df) # ~10x reduction with ProcessPoolExecutor(max_workers=n_workers) as ex: affinities = list(ex.map( lambda smi: vina_dock(smi, receptor_pdbqt, center, box), df_stage1['smiles'])) df_stage1['vina_affinity'] = affinities df_stage2 = df_stage1.nsmallest(int(len(df_stage1) * 0.01), 'vina_affinity') df_stage2['gnina_affinity'] = df_stage2['smiles'].apply( lambda smi: gnina_rescore(smi, receptor_pdbqt, center, box)) df_stage3 = df_stage2.nsmallest(1000, 'gnina_affinity') return pose_validate(df_stage3) ``` For 10M-compound libraries (ZINC22, Enamine REAL), use slurm-orchestrated parallel docking. Single-node GPU GNINA: ~3000 ligands/day. SLURM cluster (50 nodes): ~150k ligands/day. ## Ultralarge Library Screening (ZINC22, Enamine REAL) | Library | Size (2024) | Format | Notes | |---------|-------------|--------|-------| | ZINC22 | 4.1B | 3D SMILES + xyz | Pre-built; 37GB tranches | | Enamine REAL | 29B | SMILES | Make-on-demand; via Enamine | | Enamine HTS | 4M | SMILES + SDF | Stock for HTS | | Mcule | 25M | SMILES | Real, in-stock | | ChEMBL | 2.5M | SMILES + bioactivity | Activity-labeled | For ultralarge VS: 1. Pre-filter to drug-like (Lipinski + Ro5 + PAINS) -> typically 10-20x reduction 2. Pre-filter by 2D similarity to known actives (Tanimoto >= 0.4 via ECFP4) -> 100x reduction 3. Vina dock the filtered subset 4. Rescore top 1% with GNINA 5. Rescore top 0.1% with MM/GBSA or FEP Lyu et al. 2019 ZINC15 ultralarge VS gold standard: 138M docked, top 96 hits picked, 13 nM-µM actives. ## Per-Tool Failure Modes ### Vina -- cross-dock failure **Trigger:** Receptor structure not the holo (co-crystal with ligand from another binder). **Mechanism:** Vina pose accuracy degrades 2-3x from self-dock (RMSD<=2A 70%) to cross-dock (RMSD<=2A 30%). **Symptom:** Top-ranked pose makes no geometric sense; key contacts missing. **Fix:** GNINA CNN scoring or ensemble docking. For genuine apo, predict holo with AlphaFold3 / Boltz-1 then dock. ### GNINA CNN -- novel chemotype out-of-distribution **Trigger:** Ligand chemotype not in PDBbind training. **Mechanism:** CNN scoring overfits to PDBbind chemotypes; novel macrocycle / peptide / PROTAC scores poorly. **Symptom:** Affinity prediction far worse than Vina alone. **Fix:** Use `--cnn_scoring rescore` (sampling still by Vina) rather than CNN sampling. Validate against co-crystal of close analog. ### Box too small **Trigger:** Binding box defined tightly around small ligand reference. **Mechanism:** Vina explores only within the box; large analogs cannot fit. **Symptom:** Many ligands report "no valid pose"; chemotype-biased hits. **Fix:** Add padding 5-10 A beyond reference ligand bounding box. For unknown ligand size, use 25x25x25 A. ### Multi-pocket protein -- wrong site **Trigger:** Protein has multiple binding sites (orthosteric + allosteric). **Mechanism:** P2Rank or AutoBox picks the most "drugable" pocket; not always the desired one. **Symptom:** Hits dock in wrong pocket; SAR confusing. **Fix:** Verify pocket from co-crystal data; explicitly set `center_x/y/z` from known ligand centroid. ### DiffDock-L -- PoseBusters invalid **Trigger:** Default DiffDock-L output for any receptor. **Mechanism:** Diffusion generates poses without physical-validity loss; ~50% fail planarity / stereochemistry / vdW overlap tests. **Symptom:** Poses look reasonable but fail PoseBusters checks. **Fix:** Filter to PB-valid (PoseBusters); rescore with GNINA. See `chemoinformatics/pose-validation`. ### Wrong ionization state **Trigger:** Ligand or receptor residues protonated incorrectly at pH 7.4. **Mechanism:** Aspartate/glutamate/histidine protonation depends on local environment; default protonation may be wrong. **Symptom:** Salt bridges missing; poses misranked. **Fix:** Run PROPKA on receptor (cabinet residue pKas); for catalytic His, manually inspect rotamer state. ## Reconciliation: Vina vs GNINA Disagreement | Vina top pose | GNINA top pose | Action | |---------------|----------------|--------| | Same pose, similar score | Same pose, similar score | Trust pose; use GNINA score for ranking | | Vina top pose ≠ GNINA top pose | Same pocket, different orientation | Use GNINA (better trained for pose ranking) | | Vina excellent, GNINA mediocre | Different pose, very different score | GNINA found pose Vina missed; trust GNINA | | Both poor scores | Many ligands score similarly poor | Wrong pocket / protein conformation; reconsider receptor | ## Common Errors | Symptom | Cause | Fix | |---------|-------|-----| | Vina segfault | PDBQT corrupted (atom names) | Re-prep with meeko | | GNINA hangs | GPU OOM | Reduce batch / `-num_modes 5` | | All affinities very poor (-3 to -5) | Wrong protonation; ligand too large for box | Re-check pKa; expand box | | Identical affinity across ligands | Receptor grid not computed | Call `v.compute_vina_maps()` before dock | | Pose poses make no sense | Receptor and ligand in different frames | Ensure same coordinate origin | | AutoDock 4 needed | Vina cannot handle Zn / Fe metals | Use AutoDock 4 with metal scoring or GOLD | | GPU mode slow | Vina is CPU-only; only GNINA is GPU | Use GNINA for GPU; Vina is multi-core CPU | ## References - Trott & Olson, *J. Comput. Chem.* 31:455 (2010) -- AutoDock Vina. - Eberhardt et al., *J. Chem. Inf. Model.* 61:3891 (2021) -- Vina 1.2 features. - Quiroga & Villarreal, *PLoS ONE* 11:e0155183 (2016) -- Vinardo scoring. - McNutt et al., *J. Cheminformatics* 13:43 (2021) -- GNINA 1.0 CNN docking. - Buttenschoen et al., *Chem. Sci.* 15:3130 (2024) -- PoseBusters (DL methods fail). - Lyu et al., *Nature* 566:224 (2019) -- ultralarge VS proof-of-concept. - Krivak & Hoksza, *J. Cheminformatics* 10:39 (2018) -- P2Rank. - Forli et al., *Nat. Protoc.* 11:905 (2016) -- AutoDockTools / meeko. ## Related Skills - chemoinformatics/molecular-io - Parse ligands - chemoinformatics/conformer-generation - Generate 3D for ligand prep - chemoinformatics/molecular-standardization - Canonicalize before docking - chemoinformatics/pose-validation - PoseBusters physical-validity QC - chemoinformatics/ml-docking-rescoring - DiffDock-L + GNINA hybrid - chemoinformatics/covalent-design - Covalent docking - chemoinformatics/free-energy-calculations - FEP for refined affinity - chemoinformatics/admet-prediction - Filter library before docking - structural-biology/structure-io - PDB / mmCIF handling - structural-biology/modern-structure-prediction - AlphaFold3 / Boltz-1 for apo receptors