Peven
I am interested in exploring new ways to derive rewards for multi-turn tasks — training agents using signals from non-terminal states. Policy optimization algorithms with sparse rewards like GRPO can struggle to generate sufficient signal for long-horizon tasks. Peven is a three-repo Python/Julia package for formalizing RL environments as modified colored Petri nets: in the net, data (tokens) moves between places and transitions, and transitions can call LLMs or tools and alter environment state. The topology is explicit, so state or rewards can be surfaced at any point during a run.
The net is authored in Python, to leverage the large RL environment ecosystem, then lowered into Julia, where the engine coordinates and schedules concurrent rollouts.