Introducing Peven

Mar 25, 2026 Project log Peven

Good environments = good evaluations = good rewards.

Peven is a Python package for building those environments with Petri nets as the underlying formalism. I'm using it to explore dense rewards and evals for long-horizon and multi-turn tasks where terminal scores aren't a strong enough signal.

DAGs

DAGs were an obvious starting point in my early iterations, but they got awkward in a few areas.

First, writing guards is complex. If you want three guards on one branch, for example a safety check, a format check, and a budget check, you either have to draw three new branches or hide the guard logic inside a node. The first option blows up the topology, and the second can hide the interaction you are trying to evaluate.

Cycles are another problem. DAGs are acyclic by definition. Retry and refinement loops are two examples of agentic workflows that are inherently cyclic. To represent a cycle in a DAG, you have to unroll the loop for a fixed number of steps. Sometimes that works, sometimes it doesn't.

Petri nets

Petri nets are a mathematical model for concurrent systems. The pieces:

Places hold tokens.
Transitions consume tokens from input places and produce tokens in output places.
Arcs are directed edges from places to transitions (inputs) and transitions to places (outputs). The net is always bipartite: places only connect to transitions, never directly to each other. Each arc has a weight, default 1. An input arc with weight w means the transition needs w tokens from that place to fire; an output arc with weight w means the transition deposits w tokens into that place.
Firing rule: a transition t is enabled when every input place p has at least w(p, t) tokens. Firing consumes those tokens and deposits outputs.

A marking M maps each place to its token count. Transition t is enabled when every input place has enough tokens:

M(p) ≥ w(p, t) for all p ∈ •t

•t: the set of input places for transition t
w(p, t): the arc weight from place p to transition t, zero if no arc

When t fires, every place updates:

M'(p) = M(p) − w(p, t) + w(t, p)

M(p): tokens at place p before firing
M'(p): tokens at place p after firing
w(p, t): tokens consumed from p by transition t
w(t, p): tokens produced into p by transition t

Say drafts holds three tokens and a judge transition consumes all three to produce one score in scores. The arc from drafts to the judge has weight 3. The arc from the judge to scores has weight 1. Nothing points from the judge back into drafts, so w(judge, drafts) = 0.

Before firing, M(drafts) = 3 and M(scores) = 0. After firing:

M'(drafts) = 3 − 3 + 0 = 0

M'(scores) = 0 − 0 + 1 = 1

Three drafts consumed, one score produced.

Colored tokens

Colored tokens are how many evaluations run through a single net without cross-contamination. Each token carries a run_id. Transitions only fire on tokens with matching colors, so independent runs share the same topology but never share tokens. You define the net once and push N concurrent runs through it.

This matters for evals because you want to compare agents, seeds, prompts, or checkpoints through the same environment while maintaining separate state.

The marking generalizes to a map from (place, color) to token count, and the firing rule applies per-color. A transition t with input place p is enabled for color c when M(p, c) ≥ w(p, t). Firing only updates the entries for c.