The Best of Both Worlds

Why the split

The engine and the authoring surface want different languages. The engine is pure math and Julia is the right home for it. The previous post covers why. But nobody designs evals in Julia. The ecosystem for serious environment work — agent SDKs, tool libraries, model clients, dataset tooling, scoring code — is all in Python. The Julia engine is gorgeous, and if the authoring API were also Julia nobody would touch it. So, author in Python, execute in Julia.

Python as the authoring layer

The Python package, peven, is the API for evals. Call it gymnasium-esque in the loose sense: a small, well-shaped interface that slots around whatever environment you already have. You keep your agents, your executors, your tools, your clients. Peven is the topological runtime that drives interaction. I don't want people to have to port an environment to a Peven framework. I just want Peven to be the thing that people run their environments through.

The seam

PevenPy.jl is the adapter between the two. It has four jobs: decode environment and run-start payloads coming from Python, lower them into Peven.jl nets, markings, guards, and join selectors, execute runs by calling Python-backed executors through callback requests, and stream engine lifecycle events back out. The transport is framed MessagePack over a socket. It's pretty naive but IPC is not my domain, and I'm actively thinking of ways to make this better, especially as it comes to leveraging Julia's data concurrency.

Why a socket and not an in-process bridge

Process isolation. Python and Julia don't need to share a heap to do their jobs. Keeping them in separate processes means each runtime manages it's own state. A failure in one layer doesn't take the other down with it.

Cold start is paid once per run. Julia's startup is the usual complaint, but in this setup it's amortized. peven-install provisions the runtime ahead of time; a run boots the engine once and then fires thousands of transitions through it. Per-firing overhead is just a socket roundtrip.

Concurrency models are incompatible. This is the one that actually forced the decision. Julia gives you real task-parallel execution. Python has the GIL. When you run Julia inside the Python process, every callback from Julia back into Python has to take the GIL, and that serializes the concurrent firings you went to Julia for. A socket boundary lets each side run its own scheduler and rendezvous at message boundaries. Julia fires transitions in parallel, Python handles callbacks on its own asyncio loop, and neither waits on the other's lock.

Example

The toy example below is a MiniGrid DoorKey rollout run using Peven. The environment is an 8x8 room where the agent has to find a key, open the door, and reach the goal. The tool surface is turn left, turn right, move forward, pick up the key, and open the door.

The mover sees the current observation and memory, then emits the next concrete action. The animation loops through the net.