---
name: cp2k
description: Use when writing, reviewing, or debugging CP2K input files, choosing DFT methods (GPW, GAPW, functionals, basis sets), setting up calculations (geometry optimization, AIMD, cell optimization, NEB, single-point energy), validating parameters (cutoff, SCF convergence, basis/pseudopotential compatibility), parsing CP2K output files, or encountering CP2K errors (SCF not converged, geometry optimization failed, OT issues). Also use when user mentions ab initio MD, DFT calculations, electronic structure, periodic DFT, Quickstep, or quantum chemistry simulations.
---

# CP2K Input Generation Skill

Reference for generating, validating, and debugging CP2K input files with DFT-aware checks.

## Quick Start

### Validate an Input File

```bash
# Input file validation (section structure, keyword checks, parameter sanity)
python scripts/validate_input.py path/to/input.inp

# Parse CP2K output for energies, SCF convergence, timings
python scripts/parse_output.py path/to/output.out
```

All script paths are relative to the skill directory (`.claude/skills/cp2k/`). Run commands from that directory, or use full paths from the repo root (e.g., `python .claude/skills/cp2k/scripts/validate_input.py input.inp`).

## Important Constraints

**DO NOT** generate files in the `examples/` directory. Always output to the user's working directory or a user-specified path.

Environment-agnostic: this skill assumes `cp2k` (or `cp2k.psmp` / `cp2k.ssmp`) is available on `$PATH` but does not assume a specific installation path.

`CP2K_DATA_DIR` must point to the CP2K data directory containing `BASIS_MOLOPT`, `GTH_POTENTIALS`, etc. Alternatively, place copies of the required basis set and pseudopotential files in the working directory and reference them by relative path in the input.

---

## Input File Structure

CP2K uses a hierarchical, section-based input format. Sections open with `&SECTION_NAME` and close with `&END SECTION_NAME`. Keywords are set inside sections. The key nesting:

```
&GLOBAL                          # Project name, run type, print level
&FORCE_EVAL                      # Method selection (Quickstep, FIST, etc.)
  &DFT                           # All electronic structure settings
    &MGRID                       #   Planewave grid: CUTOFF, REL_CUTOFF, NGRIDS
    &QS                          #   Quickstep method: EPS_DEFAULT, METHOD (GPW/GAPW)
    &SCF                         #   Self-consistent field convergence
      &OT                        #     Orbital transformation (insulators)
      &DIAGONALIZATION            #     Standard diag (metals)
      &OUTER_SCF                 #     Outer loop for tight convergence
      &MIXING                    #     Density mixing for diagonalization
    &XC                          #   Exchange-correlation functional
      &XC_FUNCTIONAL             #     PBE, BLYP, B3LYP, PBE0, etc.
      &VDW_POTENTIAL             #     Dispersion corrections (DFT-D3, rVV10)
  &SUBSYS                        # Atomic structure definition
    &CELL                        #   ABC vectors, PERIODIC XYZ/XY/NONE
    &COORD                       #   Atomic positions (Cartesian or scaled)
    &KIND <element>              #   Per-element basis set and pseudopotential
  &MM                            # Classical force field (FIST method only)
&MOTION                          # Geometry opt, MD, cell opt, NEB
  &GEO_OPT                      #   Geometry optimizer settings
  &CELL_OPT                     #   Cell vector optimization
  &MD                            #   Molecular dynamics settings
  &BAND                          #   NEB settings
  &PRINT                         #   Trajectory, restart, energy output
```

### Minimal Example: Water Single-Point Energy

```
&GLOBAL
  PROJECT water
  RUN_TYPE ENERGY
  PRINT_LEVEL MEDIUM
&END GLOBAL

&FORCE_EVAL
  METHOD Quickstep
  &DFT
    BASIS_SET_FILE_NAME BASIS_MOLOPT
    POTENTIAL_FILE_NAME GTH_POTENTIALS
    &MGRID
      CUTOFF 400
      REL_CUTOFF 60
    &END MGRID
    &QS
      EPS_DEFAULT 1.0E-12
    &END QS
    &SCF
      EPS_SCF 1.0E-6
      MAX_SCF 50
      &OT
        MINIMIZER DIIS
        PRECONDITIONER FULL_ALL
      &END OT
    &END SCF
    &XC
      &XC_FUNCTIONAL PBE
      &END XC_FUNCTIONAL
    &END XC
  &END DFT
  &SUBSYS
    &CELL
      ABC 10.0 10.0 10.0
      PERIODIC NONE
    &END CELL
    &COORD
      O   0.000  0.000  0.000
      H   0.757  0.586  0.000
      H  -0.757  0.586  0.000
    &END COORD
    &KIND O
      BASIS_SET DZVP-MOLOPT-GTH
      POTENTIAL GTH-PBE-q6
    &END KIND
    &KIND H
      BASIS_SET DZVP-MOLOPT-GTH
      POTENTIAL GTH-PBE-q1
    &END KIND
  &END SUBSYS
&END FORCE_EVAL
```

---

## Method Selection Decision Tree

```dot
digraph method_selection {
    rankdir=TB;
    node [shape=diamond];

    calc [label="What calculation?" shape=ellipse];
    sys [label="System type?"];
    metal [label="Metallic?"];

    node [shape=box];
    sp [label="RUN_TYPE ENERGY\n(single point)"];
    geo [label="RUN_TYPE GEO_OPT\n&MOTION > &GEO_OPT"];
    md [label="RUN_TYPE MD\n&MOTION > &MD"];
    cell [label="RUN_TYPE CELL_OPT\n&MOTION > &CELL_OPT\nSTRESS_TENSOR ANALYTICAL"];
    neb [label="RUN_TYPE BAND\n&MOTION > &BAND"];
    vib [label="RUN_TYPE VIBRATIONAL_ANALYSIS\n&VIBRATIONAL_ANALYSIS"];

    diag [label="Diagonalization\n+ SMEARING\n+ MIXING (Broyden)"];
    ot [label="OT minimizer\n+ FULL_ALL preconditioner\n+ OUTER_SCF"];

    calc -> sp [label="energy"];
    calc -> geo [label="geometry opt"];
    calc -> md [label="AIMD"];
    calc -> cell [label="cell opt"];
    calc -> neb [label="NEB"];
    calc -> vib [label="frequencies"];

    sp -> sys; geo -> sys; md -> sys; cell -> sys; neb -> sys; vib -> sys;
    sys -> metal;
    metal -> diag [label="yes (metal/\nsmall gap)"];
    metal -> ot [label="no (insulator/\nmolecular)"];
}
```

**Rule of thumb:** OT is faster and more robust for systems with a band gap. Diagonalization is required when fractional occupations exist (metals, radicals with near-degenerate states).

---

## Basis Set & Pseudopotential Quick Reference

| Quality | Basis Set | Use Case |
|---------|-----------|----------|
| Minimum | SZV-MOLOPT-GTH | Quick tests only |
| Standard | DZVP-MOLOPT-GTH | Production (most cases) |
| Standard (condensed) | DZVP-MOLOPT-SR-GTH | Condensed phase, saves cost |
| High | TZVP-MOLOPT-GTH | Benchmark, high accuracy |
| Very High | TZV2P-MOLOPT-GTH | Reference calculations |

### Functional / Pseudopotential Matching Rules

The pseudopotential family **must** match the XC functional used to generate it:

| XC Functional | Pseudopotential | Notes |
|---------------|-----------------|-------|
| PBE | GTH-PBE | Most common choice |
| BLYP | GTH-BLYP | Also used for B3LYP |
| B3LYP | GTH-BLYP | No GTH-B3LYP exists |
| PBE0 | GTH-PBE | PBE-derived hybrid |
| SCAN | GTH-PBE | Use PBE pseudo (standard practice) |

**Valence electron count** must be consistent between basis set and pseudopotential. The `-qN` suffix on the pseudopotential denotes N valence electrons. Verify: oxygen uses `-q6`, carbon `-q4`, hydrogen `-q1`, nitrogen `-q5`, silicon `-q4`, etc.

### Data Files

| File | Contents |
|------|----------|
| `BASIS_MOLOPT` | MOLOPT basis sets for standard DFT |
| `BASIS_MOLOPT_UZH` | Extended basis sets, needed for hybrid functionals (HFX) |
| `GTH_POTENTIALS` | GTH pseudopotentials for all functional families |
| `BASIS_ADMM` | Auxiliary basis sets for ADMM (hybrid functional acceleration) |
| `POTENTIAL` | All-electron potentials (rarely needed) |

---

## Cutoff Selection

`CUTOFF` controls the planewave expansion of the electron density. `REL_CUTOFF` controls the finest grid level.

| Parameter | Typical Range | Default Recommendation |
|-----------|--------------|----------------------|
| CUTOFF | 300-600 Ry | 400 Ry (safe starting point) |
| REL_CUTOFF | 50-60 Ry | 60 Ry |
| NGRIDS | 4-5 | 5 (default, rarely changed) |

### Convergence Testing

Run single-point calculations at increasing CUTOFF (200, 300, 400, 500, 600 Ry). Plot total energy vs cutoff. Converged when energy change < 1 meV/atom between successive values.

**Quick check in output:** search for "Total energy of the electronic density on regular grids." The distribution of density across grids is printed in the output. The "Electronic density on regular grids" section should show the second number (density mapped to finest grid) < 1E-8. If it is larger, increase REL_CUTOFF.

---

## SCF Settings

### OT (Orbital Transformation) -- Insulators and Molecules

Preferred for systems with a band gap. Converges in fewer SCF steps than diagonalization.

```
&SCF
  EPS_SCF 1.0E-6
  MAX_SCF 50
  &OT
    MINIMIZER DIIS        # DIIS (faster) or CG (more robust)
    PRECONDITIONER FULL_ALL  # FULL_ALL (small/difficult), FULL_KINETIC (large systems)
  &END OT
  &OUTER_SCF              # Required for tight convergence with OT
    EPS_SCF 1.0E-6
    MAX_SCF 20
  &END OUTER_SCF
&END SCF
```

| Preconditioner | Cost | Use When |
|----------------|------|----------|
| FULL_ALL | O(N^3) | Small systems (< 500 atoms), difficult convergence |
| FULL_KINETIC | O(N) | Large systems (> 500 atoms) |
| FULL_SINGLE_INVERSE | Medium | Compromise between FULL_ALL and FULL_KINETIC |

### Diagonalization -- Metals and Small-Gap Systems

Required when fractional occupations are needed.

```
&SCF
  EPS_SCF 1.0E-6
  MAX_SCF 100
  ADDED_MOS 20            # Extra empty states (increase for metals)
  &DIAGONALIZATION
  &END DIAGONALIZATION
  &SMEARING
    METHOD FERMI_DIRAC
    ELECTRONIC_TEMPERATURE [K] 300
  &END SMEARING
  &MIXING
    METHOD BROYDEN_MIXING
    ALPHA 0.2              # Mixing parameter (lower = more stable, slower)
  &END MIXING
&END SCF
```

### EPS_SCF Guidelines

| Calculation Type | EPS_SCF | Rationale |
|-----------------|---------|-----------|
| MD (AIMD) | 1E-6 | Sufficient for forces |
| Geometry optimization | 1E-8 | Tight convergence needed for gradients |
| Cell optimization | 1E-8 | Stress tensor requires tight SCF |
| Vibrational analysis | 1E-8 | Numerical derivatives amplify noise |

**Rule:** `sqrt(EPS_DEFAULT)` should be smaller than `EPS_SCF`. Default `EPS_DEFAULT` of 1E-12 pairs with `EPS_SCF` of 1E-6.

---

## Simulation Workflow

### 0. Structure Preparation

Obtain coordinates from `.xyz`, `.cif`, databases (Materials Project, COD), or build manually. Set up `&CELL` (lattice vectors or ABC + angles) and `&COORD` (Cartesian or SCALED).

For isolated molecules: use `PERIODIC NONE` in `&CELL` and `POISSON_SOLVER MT` (or `WAVELET`) in `&POISSON`.

For periodic systems: ensure cell vectors are consistent with crystal structure. Use `PERIODIC XYZ` (3D), `PERIODIC XY` (slab), or `PERIODIC X` (wire).

### 1. Geometry Optimization

```
&GLOBAL
  RUN_TYPE GEO_OPT
&END GLOBAL
&MOTION
  &GEO_OPT
    OPTIMIZER BFGS          # BFGS (default, fast) or LBFGS (large systems) or CG
    MAX_ITER 500
    MAX_DR 3.0E-3           # Max displacement [bohr]
    MAX_FORCE 4.5E-4        # Max force [bohr/hartree]
    RMS_DR 1.5E-3
    RMS_FORCE 3.0E-4
  &END GEO_OPT
&END MOTION
```

Use tight `EPS_SCF` (1E-8). For cells under pressure or with soft modes, also optimize cell vectors (use `RUN_TYPE CELL_OPT` instead).

### 2. Equilibration (AIMD)

Short run to thermalize the system. Use a robust thermostat (CSVR recommended for equilibration).

```
&GLOBAL
  RUN_TYPE MD
&END GLOBAL
&MOTION
  &MD
    ENSEMBLE NVT
    STEPS 5000
    TIMESTEP 0.5            # fs
    TEMPERATURE 300
    &THERMOSTAT
      TYPE CSVR
      &CSVR
        TIMECON 50           # fs, coupling constant
      &END CSVR
    &END THERMOSTAT
  &END MD
&END MOTION
```

Add wavefunction extrapolation to reduce SCF iterations per MD step:

```
&DFT
  WFN_RESTART_FILE_NAME RESTART.wfn   # restart from previous wavefunction
  &SCF
    SCF_GUESS RESTART
  &END SCF
  &XC
    ...
  &END XC
&END DFT
```

In `&FORCE_EVAL > &DFT`:
```
  EXTRAPOLATION PS
  EXTRAPOLATION_ORDER 3
```

### 3. Production (AIMD)

Continue from equilibrated state. Longer run with trajectory output.

```
&MOTION
  &MD
    ENSEMBLE NVT
    STEPS 50000
    TIMESTEP 0.5
    TEMPERATURE 300
    &THERMOSTAT
      TYPE NOSE
      REGION MASSIVE
      &NOSE
        LENGTH 3
        TIMECON 100          # fs
      &END NOSE
    &END THERMOSTAT
  &END MD
  &PRINT
    &TRAJECTORY
      &EACH
        MD 10
      &END EACH
    &END TRAJECTORY
    &VELOCITIES
      &EACH
        MD 10
      &END EACH
    &END VELOCITIES
    &FORCES
      &EACH
        MD 100
      &END EACH
    &END FORCES
    &RESTART
      &EACH
        MD 500
      &END EACH
    &END RESTART
    &RESTART_HISTORY
      &EACH
        MD 5000
      &END EACH
    &END RESTART_HISTORY
  &END PRINT
&END MOTION
```

### Thermostat Comparison

| Thermostat | CP2K Keyword | Use For | Notes |
|------------|-------------|---------|-------|
| Nose-Hoover | NOSE | General NVT production | Correct canonical; use REGION MASSIVE |
| CSVR | CSVR | Equilibration, robust | Fast thermalization, correct canonical ensemble |
| GLE | GLE | Path integrals, NQE | Generalized Langevin, advanced use |

### Ensemble Selection

| Ensemble | CP2K Keyword | Use For |
|----------|-------------|---------|
| NVE | NVE | Microcanonical, transport properties, energy conservation tests |
| NVT | NVT | Most simulations, thermalized sampling |
| NPT_I | NPT_I | Isotropic pressure (liquids, amorphous) |
| NPT_F | NPT_F | Anisotropic pressure (crystals, anisotropic cells) |

### AIMD Timestep

- **0.5 fs** -- safe default for systems with hydrogen
- **1.0 fs** -- acceptable for heavy-element-only systems (no H, no Li)
- Reduce to **0.25 fs** for high-temperature simulations (> 1000 K) or light-element-rich systems

### Wavefunction Extrapolation

Always enable for MD. Extrapolation predicts the initial wavefunction guess from previous steps, reducing SCF iterations by 30-50%.

```
EXTRAPOLATION PS
EXTRAPOLATION_ORDER 3
```

Use `PS` (polynomial scheme). Order 3 is the recommended default. Higher orders can introduce instabilities.

---

## Output Files

| File | Content |
|------|---------|
| `PROJECT.out` | Main log: SCF convergence, energies, forces, timings |
| `PROJECT-pos-1.xyz` | Trajectory (atomic positions) |
| `PROJECT-vel-1.xyz` | Velocities |
| `PROJECT-frc-1.xyz` | Forces on atoms |
| `PROJECT-1.ener` | Step, time, kinetic E, temperature, potential E, conserved quantity |
| `PROJECT-1.cell` | Cell parameters per step (NPT/CELL_OPT) |
| `PROJECT-1.restart` | Human-readable restart file |
| `PROJECT-RESTART.wfn` | Binary wavefunction restart (for SCF_GUESS RESTART) |
| `PROJECT-RESTART.wfn.bak-1` | Previous wavefunction backup |

### Key Output Patterns to Check

- **SCF convergence:** grep for `SCF run converged` or `SCF run NOT converged`
- **Total energy:** grep for `ENERGY| Total FORCE_EVAL`
- **Forces:** grep for `SUM OF ATOMIC FORCES` (should be near zero for equilibrium)
- **Geometry opt:** grep for `OPTIMIZATION COMPLETED` or `MAXIMUM NUMBER OF OPTIMIZATION STEPS REACHED`

---

## Validation

### validate_input.py Checks

| Check | What It Catches |
|-------|-----------------|
| Section nesting | Unclosed sections, mismatched `&END` tags |
| Required sections | Missing `&GLOBAL`, `&FORCE_EVAL`, `&SUBSYS` |
| Basis/pseudo consistency | Functional-pseudopotential family mismatch |
| Valence electrons | `-qN` suffix inconsistency with element |
| CUTOFF range | Values below 200 Ry or above 1200 Ry flagged |
| REL_CUTOFF range | Values below 40 Ry flagged |
| EPS_SCF vs run type | Loose EPS_SCF with geometry/cell optimization |
| OT on metals | OT without band gap warning |
| CELL_OPT stress | Missing STRESS_TENSOR ANALYTICAL |
| Periodicity/Poisson | PERIODIC NONE without appropriate Poisson solver |
| Wavefunction extrapolation | MD without EXTRAPOLATION flagged |
| Thermostat region | NVT without REGION MASSIVE flagged |

---

## Common Mistakes

1. **Basis/pseudopotential functional mismatch.** GTH-PBE pseudopotentials with BLYP functional. Match the family: PBE functional uses GTH-PBE, BLYP uses GTH-BLYP.

2. **Valence electron count mismatch.** Using `-q4` pseudopotential where `-q6` is needed (e.g., oxygen). Check element-specific valence counts in `GTH_POTENTIALS`.

3. **CUTOFF too low.** Values below 200 Ry produce inaccurate forces and energies. Start at 400 Ry and converge systematically.

4. **OT on metallic systems.** OT assumes integer occupations. Metals require diagonalization with SMEARING. Symptom: SCF oscillates and never converges.

5. **Missing STRESS_TENSOR ANALYTICAL for CELL_OPT.** Cell optimization needs stress tensors. Without this keyword, CP2K uses numerical stress (slow and less accurate) or fails.

6. **EPS_SCF too loose for geometry/cell optimization.** MD tolerates 1E-6; geometry and cell optimization need 1E-8. Loose SCF produces noisy gradients that prevent convergence.

7. **No wavefunction extrapolation for MD.** Without `EXTRAPOLATION PS`, each MD step starts SCF from atomic guess, wasting 2-5x SCF iterations.

8. **Missing OUTER_SCF with OT.** A single OT SCF loop may not converge tightly enough. OUTER_SCF provides an additional convergence layer that re-evaluates the Kohn-Sham matrix.

9. **Wrong periodicity for isolated molecules.** Use `PERIODIC NONE` in `&CELL` and set `&POISSON` solver to `MT` (Martyna-Tuckerman) or `WAVELET`. Default periodic boundary conditions introduce spurious interactions with images.

10. **Forgetting REGION MASSIVE for NVT thermostat.** Without `REGION MASSIVE`, Nose-Hoover couples to global kinetic energy only, leading to poor temperature equipartition. Always set `REGION MASSIVE` for efficient NVT sampling.

---

## Best Practices

1. **Converge CUTOFF before production.** Run single-point calculations at 200, 300, 400, 500, 600 Ry. Plot energy vs cutoff. Use the smallest value where energy change < 1 meV/atom.

2. **Always restart wavefunction in MD.** Set `SCF_GUESS RESTART` and `EXTRAPOLATION PS` with `EXTRAPOLATION_ORDER 3`. This cuts SCF cost by 30-50%.

3. **Use CSVR thermostat for equilibration, Nose-Hoover for production.** CSVR thermalizes faster; Nose-Hoover gives correct canonical sampling with REGION MASSIVE.

4. **Optimize geometry before MD.** Run `GEO_OPT` to remove large forces, then start AIMD. Skipping this wastes equilibration time and risks instabilities.

5. **Save restart files frequently.** AIMD is expensive. Write `RESTART` every 500 steps and `RESTART_HISTORY` periodically to enable recovery from crashes.

6. **Use ADMM for hybrid functionals.** Auxiliary Density Matrix Method (ADMM) accelerates HFX evaluation by 5-10x with minimal accuracy loss. Requires `BASIS_ADMM` file.

7. **Check SCF convergence in output.** Grep for `SCF run NOT converged` after every calculation. Unconverged SCF invalidates all downstream results (forces, energies, stresses).

8. **Use DZVP-MOLOPT-SR-GTH for condensed phase.** The short-range variant reduces basis set superposition error in dense periodic systems while maintaining accuracy.

---

## References

- [basis-sets.md](references/basis-sets.md) -- Basis set families, selection criteria, MOLOPT vs standard
- [functionals.md](references/functionals.md) -- XC functional selection, hybrid setup, dispersion corrections
- [scf-convergence.md](references/scf-convergence.md) -- SCF troubleshooting, OT vs diag, preconditioner selection
- [best-practices.md](references/best-practices.md) -- Production workflow, convergence testing, performance tuning
- [troubleshooting.md](references/troubleshooting.md) -- Common errors, SCF failures, geometry opt stalls, MD instabilities
