Browser tests are non-deterministic by design. The same code, same browser, and same page can produce different results between runs. A setTimeout race causes a toast notification to fire before a confirmation div renders, and the agent observes stale state. A network response arrives in a different order. An animation callback shifts a layout check by a single frame. The page looks identical. The timing is not.
This makes agent sessions unreliable and unreplayable. If you cannot reproduce what a browser agent did, you cannot debug it, audit it, or trust it. The root problem is always the same: browsers are non-deterministic by design, and we keep trying to make deterministic claims about what happened inside one.
So I built DBAR (Deterministic Browser Agent Runtime). It eliminates non-determinism at the source and makes browser sessions replayable, down to the byte.
Three sources of chaos
Browser non-determinism comes from exactly three places:
- Time.
Date.now(),setTimeout,requestAnimationFrame, all tied to wall-clock time, which varies between runs. A 200ms timeout that fires before a network response locally might fire after it in CI. - Network. HTTP responses arrive in different orders, at different speeds, sometimes with different content. CDN cache hits, cold starts, and regional routing all introduce variance.
- State. DOM mutations race with script execution. A MutationObserver callback and a fetch resolution can interleave differently on every run. The final DOM looks the same to a human. It does not look the same to a hash function.
Every flaky browser test traces back to one of these three. Every "it works on my machine" claim about a browser agent does too. DBAR controls all three at the CDP level, not by patching symptoms, but by making the entire execution environment deterministic.
The approach
DBAR uses Chrome DevTools Protocol to intercept each source of non-determinism before it reaches the page:
- Virtual time via
Emulation.setVirtualTimePolicy. The browser's internal clock advances only when DBAR says so.Date.now(),setTimeout, andrequestAnimationFrameall run on synthetic time. No wall-clock variance. - Network record/replay via the CDP
Fetchdomain. During capture, every request and response is recorded, including headers, body, and timing. During replay, recorded responses are served back. No network hits, no variance. - Full state snapshots at each step boundary. DOM snapshot, accessibility tree, screenshot. Everything hashed with SHA-256. If the state diverges, you know exactly when and by how much.
The API
The API has three operations: capture records a session with snapshots and hashes at each step, replay serves recorded responses on a fresh page and compares state hashes, and validate checks archive integrity offline without a browser.
What's in a capsule
A DBAR archive, what I call a "determinism capsule," is a self-contained bundle:
- Manifest. Metadata, step list, hash chain, timestamps (virtual, not wall-clock).
- Network transcript. Every request/response pair recorded, with response bodies deduplicated and stored by content hash.
- Per-step snapshots. DOM tree, accessibility tree, screenshot, and their SHA-256 hashes.
No external dependencies. You can take a capsule from one machine, replay it on another, and get identical hashes. If an agent says it filled a form and clicked submit, the capsule proves it, or proves it didn't.
Observables and metrics
Not all hashes are equal. DBAR distinguishes between strict and advisory observables:
- Strict: DOM hash, accessibility tree hash, network digest. These must match between capture and replay. A mismatch means the session diverged.
- Advisory: Screenshot hash. Pixel-level differences can come from font rendering, GPU compositing, subpixel antialiasing. Useful signal, but not a hard gate.
Replay produces three metrics:
- RSR (Replay Success Rate), the fraction of steps where all strict hashes match. 1.0 means perfect replay.
- DVR (Determinism Violation Rate), equal to
1 - RSR. 0.0 is what you want. - TTD (Time to Divergence), which step first diverged, if any.
Anything less than 1.0 RSR tells you exactly where determinism broke down.
Get started
Links: GitHub (private for now) / npm
What's next
This is v0.1.0. It handles single-page sessions in Chromium. The roadmap includes multi-page support, Firefox and WebKit backends, and tighter CI integration for automated regression detection.
DBAR is independent of BAP. It works with any Playwright setup, and you don't need the Browser Agent Protocol to use it. But if you are building browser agents, the combination is where it gets interesting: BAP gives agents structured access to the browser, and DBAR proves what they did with it.
The repo is private for now while I stabilize the API, but the package is live on npm. If you've been fighting flaky browser tests or trying to make agent sessions auditable, give it a try and tell me what breaks.