Introducing Peven
Good environments = good evaluations = good rewards.
Peven is a Python package for building those environments with Petri nets as the underlying formalism. I'm using it to explore dense rewards and evals for long-horizon and multi-turn tasks where terminal scores aren't a strong enough signal.
DAGs
DAGs were an obvious starting point in my early iterations, but they got awkward in a few areas.
First, writing guards is complex. If you want three guards on one branch, for example a safety check, a format check, and a budget check, you either have to draw three new branches or hide the guard logic inside a node. The first option blows up the topology, and the second can hide the interaction you are trying to evaluate.
Cycles are another problem. DAGs are acyclic by definition. Retry and refinement loops are two examples of agentic workflows that are inherently cyclic. To represent a cycle in a DAG, you have to unroll the loop for a fixed number of steps. Sometimes that works, sometimes it doesn't.
Petri nets
Petri nets are a mathematical model for concurrent systems. The pieces:
- Places hold tokens.
- Transitions consume tokens from input places and produce tokens in output places.
- Arcs are directed edges from places to transitions (inputs) and transitions to places (outputs). The net is always bipartite: places only connect to transitions, never directly to each other. Each arc has a weight, default 1. An input arc with weight
wmeans the transition needswtokens from that place to fire; an output arc with weightwmeans the transition depositswtokens into that place. - Firing rule: a transition
tis enabled when every input placephas at leastw(p, t)tokens. Firing consumes those tokens and deposits outputs.
A marking M maps each place to its token count. Transition t is enabled when every input place has enough tokens:
M(p) ≥ w(p, t) for all p ∈ •t
•t- the set of input places for transition
t w(p, t)- the arc weight from place
pto transitiont, zero if no arc
When t fires, every place updates:
M'(p) = M(p) − w(p, t) + w(t, p)
M(p)- tokens at place
pbefore firing M'(p)- tokens at place
pafter firing w(p, t)- tokens consumed from
pby transitiont w(t, p)- tokens produced into
pby transitiont
Say drafts holds three tokens and a judge transition consumes all three to produce one score in scores. The arc from drafts to the judge has weight 3. The arc from the judge to scores has weight 1. Nothing points from the judge back into drafts, so w(judge, drafts) = 0.
Before firing, M(drafts) = 3 and M(scores) = 0. After firing:
M'(drafts) = 3 − 3 + 0 = 0
M'(scores) = 0 − 0 + 1 = 1
Three drafts consumed, one score produced.
Colored tokens
Colored tokens are how many evaluations run through a single net without cross-contamination. Each token carries a run_id. Transitions only fire on tokens with matching colors, so independent runs share the same topology but never share tokens. You define the net once and push N concurrent runs through it.
This matters for evals because you want to compare agents, seeds, prompts, or checkpoints through the same environment while maintaining separate state.
The marking generalizes to a map from (place, color) to token count, and the firing rule applies per-color. A transition t with input place p is enabled for color c when M(p, c) ≥ w(p, t). Firing only updates the entries for c.