---
name: pyhepmc
description: >-
  Use when reading or writing HepMC3 event records in Python: opening HepMC2,
  HepMC3, LHEF, or HEPEVT files from a Monte Carlo generator, iterating over
  events, accessing truth particles and vertices, filtering by status code or
  PDG ID, traversing decay trees via particle parents/children, using the NumPy
  API for fast vectorized processing, visualizing events in Jupyter notebooks,
  or writing modified events back to a HepMC file.
---

# pyhepmc

## Overview

pyhepmc provides Python bindings for the HepMC3 event record library. It reads
HepMC2, HepMC3, LHEF, and HEPEVT ASCII files produced by Monte Carlo generators
(Pythia8, Sherpa, MadGraph, EvtGen, EPOS-LHC, SIBYLL) and exposes particles,
vertices, and event metadata as Python objects. It is the standard tool for
truth-level generator studies in the Scikit-HEP ecosystem.

Unique features beyond the official HepMC3 Python bindings:

- Simple `pyhepmc.open` for format auto-detection and compressed-file support
- NumPy array views into particle/vertex data (up to 70× faster than the
  standard API for large events)
- Event graph rendering in Jupyter notebooks (via graphviz)
- Pythonic properties, context managers, and `repr` strings

## When to Use

- Reading generator-level truth events for particle-level analysis
- Extracting final-state particles for truth jet clustering with fastjet
- Inspecting decay chains or particle status codes
- Writing test events or modified events for generator validation
- Fast vectorized analysis with the NumPy API

## Key Concepts

| Concept        | Notes                                                         |
| -------------- | ------------------------------------------------------------- |
| `GenEvent`     | One event: particles + vertices + weights + run info          |
| `GenParticle`  | `FourVector` (px, py, pz, e in GeV) + PDG ID + status code    |
| Status 1       | Final-state (stable) particles                                |
| Status 2       | Decayed particles (intermediate)                              |
| Status 3       | Parton-level / documentation particles                        |
| `GenVertex`    | Connects incoming and outgoing particles                      |
| `GenRunInfo`   | Per-run metadata: generator tool, weight names, attributes    |
| `GenEventData` | Flat in-memory representation; supports NumPy array views     |
| Weights        | Named weight list (e.g. scale/PDF variations); access by name |

## Canonical Patterns

### Read a HepMC file

```python
import pyhepmc

# auto-detects HepMC3, HepMC2, LHEF, HEPEVT; handles .gz/.bz2/.xz/.zst
with pyhepmc.open("events.hepmc3") as f:
    for event in f:
        particles = event.particles      # list of GenParticle
        vertices  = event.vertices       # list of GenVertex
```

### Filter final-state particles

```python
with pyhepmc.open("events.hepmc3") as f:
    for event in f:
        final_state = [p for p in event.particles if p.status == 1]
        # FourVector components: .px .py .pz .e  (all in GeV)
        for p in final_state:
            pt = (p.momentum.px**2 + p.momentum.py**2) ** 0.5
            print(p.pid, pt)
```

### NumPy API — fast vectorized processing

Prefer this over Python loops for large events; up to 70× faster on events with
thousands of particles.

```python
import numpy as np

with pyhepmc.open("events.hepmc3") as f:
    for event in f:
        p = event.numpy.particles        # read-only array view
        mask = (np.abs(p.pid) == 2212) & (p.status == 1)
        esum = np.sum(p.e[mask])         # sum proton energies
        pt = np.sqrt(p.px[mask]**2 + p.py[mask]**2)
```

Available columns on `event.numpy.particles`: `pid`, `status`, `px`, `py`, `pz`,
`e`, `m`. Available columns on `event.numpy.vertices`: `x`, `y`, `z`, `t`,
`status`.

### Traverse decay tree (parents / children)

```python
with pyhepmc.open("events.hepmc3") as f:
    for event in f:
        for p in event.particles:
            if abs(p.pid) == 23:                   # Z boson
                leptons = [c for c in p.children
                           if abs(c.pid) in (11, 13)]
                for lep in leptons:
                    print("Z->lepton:", lep.pid, lep.parents)
```

### Identify particles by PDG ID

```python
with pyhepmc.open("events.hepmc3") as f:
    for event in f:
        b_hadrons = [p for p in event.particles
                     if abs(p.pid) in (511, 521, 531, 5122)]
```

### Extract four-vectors for fastjet clustering

```python
import pyhepmc, fastjet as fj

with pyhepmc.open("events.hepmc3") as f:
    for event in f:
        pjs = [
            fj.PseudoJet(p.momentum.px, p.momentum.py,
                         p.momentum.pz, p.momentum.e)
            for p in event.particles
            if p.status == 1 and abs(p.pid) not in (12, 14, 16)
        ]
        cs  = fj.ClusterSequence(pjs, fj.JetDefinition(fj.antikt_algorithm, 0.4))
        jets = fj.sorted_by_pt(cs.inclusive_jets(ptmin=20.0))
```

### Write HepMC3 events

```python
# default: HepMC3 ASCII; use format="hepmc2" for legacy output
with pyhepmc.open("output.hepmc3", "w", precision=6) as out:
    with pyhepmc.open("input.hepmc3") as f:
        for event in f:
            out.write(event)
```

`precision` controls the number of significant digits (default is full double).
Reducing it (e.g. `precision=4`) can significantly reduce file size.

### Access named weights (scale / PDF variations)

```python
with pyhepmc.open("events.hepmc3") as f:
    for event in f:
        # shortcut: event.weight("name") raises RuntimeError if missing
        nominal = event.weight("Default")
        mur_up  = event.weight("MUR2_MUF1")
        # or build the dict yourself:
        w = dict(zip(event.weight_names, event.weights))
```

### Set generator metadata and write

```python
evt.run_info = pyhepmc.GenRunInfo()
evt.run_info.tools = [
    pyhepmc.GenRunInfo.ToolInfo("Pythia", "8.311", "pp collisions at 13 TeV")
]
evt.run_info.weight_names = ["Default", "MUR2_MUF1"]
```

### Visualize an event (Jupyter / graphviz)

```python
from pyhepmc.view import to_dot, savefig

# renders inline in Jupyter (GenEvent also has _repr_html_ automatically)
g = to_dot(event, size=(6, 6))

# save to file
savefig(event, "event.svg")   # also .png, .pdf
```

### Debug: print event listing

```python
import pyhepmc

with pyhepmc.open("events.hepmc3") as f:
    event = f.read()

print(pyhepmc.listing(event))   # tabular particle/vertex listing
print(pyhepmc.content(event))   # raw HepMC3 ASCII dump
```

## Gotchas

- **HepMC momentum units are GeV** (not MeV): `p.momentum.e` is in GeV. This is
  opposite to ATLAS ntuple branches, which are in MeV.
- **Status code conventions vary by generator**: status=1 for final-state is
  standard; beyond that, Pythia8 and Sherpa use generator-specific codes.
- **Neutrinos pass isolation cuts**: exclude neutrino PDG IDs (12, 14, 16) from
  visible-particle sums and jet inputs.
- **HepMC2 vs HepMC3**: `pyhepmc.open` auto-detects the format from the file
  header. HepMC2 files contain `HepMC::IO_GenEvent`; HepMC3 files start with
  `HepMC::Asciiv3`.
- **Attributes deserialise as `UnparsedAttribute`**: when reading back an
  attribute that was written, HepMC3 does not store the type, so you get an
  `UnparsedAttribute` object. Call `.astype(bool)` / `.astype(int)` etc. to
  convert (this also replaces the entry in place).
- **NumPy API is read-only**: `event.numpy.particles` is a view; modifying those
  arrays does not change the `GenEvent`. Use `GenEventData` if you need to
  mutate event content via arrays.
- **`event.weight_names` requires `run_info`**: if `event.run_info` is `None`
  this raises `RuntimeError`. Check `event.run_info is not None` first.

## Interop

- **fastjet**: extract final-state particles → build `PseudoJet` list → cluster.
- **particle**: identify PDG IDs with `Particle.from_pdgid(p.pid)` or use
  `from particle import literals as lp; lp.proton.pdgid`.
- **pylhe**: for parton-level LHE files; pyhepmc handles showered / hadronised
  events. `pyhepmc.open` can also read LHEF directly (`format="lhef"`).
- **awkward**: convert final-state particle lists to awkward arrays for
  vectorised columnar analysis.
- **vector**: `import vector; v = vector.obj(px=p.momentum.px, ...)` for
  Lorentz-boost utilities.
- **numpy**: use `event.numpy.particles` or `GenEventData` for batch array
  operations across many particles.

## Docs

https://scikit-hep.org/pyhepmc/
