Semantic selectors solved half the problem. A browser agent can find role=button, name="Sign In" on any page, regardless of DOM structure. But what happens when the agent comes back tomorrow and the button's label changed to "Log In"? Or when the page has three buttons named "Submit" and the agent needs the second one?
Session-scoped refs die with the session. CSS selectors die with the next deploy. The element is still there. The agent just can't prove it.
I ran into this building BAP. The protocol gives agents stable refs within a session (like @e1, @e2), but those refs are ephemeral. Close the browser, open a new one, and every ref resets. For any agent that needs to remember elements across runs (monitoring workflows, regression checks, multi-session tasks), you need identity that persists beyond a single page load.
uSEID is what I built to solve that. It's a portable element signature that encodes what an element is, where it sits in the DOM, and where it appears on screen, then resolves that signature against a live page with a confidence score and safety gate.
div.auth > button:first-child ✓ Sign In div.auth > button:first-child ✗ Sign Up uSEID: button "Sign In" ✓ confidence 0.951 The three signals
People find elements using multiple cues at once. You don't find the "Sign In" button purely by its label, or purely by its position, or purely by its context. You triangulate. uSEID does the same thing with three weighted signals:
| Signal | Weight | What it captures |
|---|---|---|
| Semantic | 50% | ARIA role + accessible name. What the element is |
| Structural | 30% | Ancestor roles, sibling labels, depth. Where it sits in the DOM |
| Spatial | 20% | Bounding box position. Where it appears on screen |
Semantic carries the most weight because role and name are the most stable properties of an element. A button named "Sign In" remains a button named "Sign In" through most refactors. But when the name does change, or when multiple elements share the same role and name, the structural and spatial signals disambiguate.
The structural signal captures context: what landmark region is this element in? What are its siblings labeled? How deep is it in the tree? The spatial signal captures position: is this element in the header or the footer, left column or right? Together, the three signals form a fingerprint that's resilient to the kinds of changes real UIs go through.
How it works
uSEID operates in two phases. First, you capture a signature from a snapshot of the page. Later (minutes, hours, or several deploys later) you resolve that signature against the current page.
Building a signature:
You pass DOM and accessibility snapshots plus an element index, and get back a JSON signature containing the semantic identity (role, name), structural context (ancestor roles, siblings), and spatial position (bounding box). The signature is portable: store it in a database, write it to a file, pass it between services.
Resolving against a live page:
To resolve, pass the stored signature plus current page snapshots. The resolver returns a confidence score, a selector hint, and when it can't resolve, ranked candidates with an explanation of why.
Safety-first resolution
This is the design decision I care about most. uSEID would rather abstain than act on the wrong element.
The safety gate enforces four constraints:
Binding check
Signatures are locked to their origin and page path. A signature captured on example.com/login will not resolve against example.com/dashboard. Cross-page resolution is blocked, not degraded.
Role gate
If no candidates match the expected ARIA role, resolution stops. A button signature will never match a link, even if every other signal aligns perfectly.
Confidence threshold
The default threshold is 0.85 out of 1.0. Below that, the resolver abstains and returns ranked candidates instead of a potentially wrong match.
Ambiguity margin
If the top two candidates score within 0.1 of each other, the resolver flags it as ambiguous rather than guessing. Two elements that look equally like the target means neither can be trusted.
The result is a system that fails explicitly. When the UI changed too much, when the element moved to a different page, when two elements are too similar, the agent gets a clear abstention with a reason, not a silent misidentification.
In agent workflows, clicking the wrong element is almost always worse than not clicking at all.
What this enables
Session-scoped refs work fine for single-run tasks: fill a form, extract data, navigate a flow. But a growing class of agent tasks requires cross-run memory:
- Monitoring: "Check if this price changed since yesterday." Needs to find the same element across page versions
- Regression testing: "Verify this button still exists after the deploy." Needs identity that survives code changes
- Multi-session workflows: "Resume where I left off." Needs to re-anchor to elements from a prior run
- Agent memory: "I learned that clicking this element leads to the checkout flow." Needs a portable reference that other agents can resolve
uSEID gives agents a way to say "this element" and mean it across time. Not "the element at this CSS path" or "the element at these coordinates" or "the third button on the page." The element that is a button, named Sign In, inside a form, inside the main landmark, near the top-center of the viewport.
Get started
The missing layer
uSEID started as a module inside BAP. The protocol needed a way to identify elements that outlived a browser session: deterministic replay, regression workflows, agent memory. The identity system turned out to be useful enough on its own to extract as a standalone package.
The web has had stable identity for documents (URLs) and for users (cookies, tokens) for decades. It never had stable identity for individual elements. data-testid was the closest thing, and it requires developers to opt in. uSEID derives identity from what the browser already knows (the accessibility tree, the DOM structure, the visual layout) without requiring any changes to the page being observed.
Selectors tell you how to find an element. Signatures tell you which element you're looking for. That distinction is the foundation for agents that can remember.