A browser action is usually fast. The slow part is the back-and-forth around it: tell the browser to do one thing, wait, observe, decide, send the next thing. In agent workflows, those round trips stack up fast.

Hand-drawn comparison of a sequential browser agent making many trips and a composite action making one combined trip.
Composite actions move obvious sequences into one browser-side operation.

A simple login flow can become ten messages:

go to login
observe
fill email
observe
fill password
observe
click sign in
wait
observe dashboard

BAP's composite action lets the agent send the predictable part in one call:

bap act \
  fill:label:"Email"="me@example.com" \
  fill:label:"Password"="..." \
  click:role:button:"Sign In" \
  --observe

The daemon executes the actions in order, waits for stability between them, and returns the next page state in the same response. The agent still reasons. It just does not pay a protocol boundary for every obvious click and fill.

In my local BAP tests, this reduced round trips by roughly half or more on common flows like login, form submission, and paginated extraction. The exact wall-time gain depends on the page and model loop. The reason it helps is stable: fewer scheduling boundaries, fewer serialized messages, fewer redundant observations.

This is especially useful in three cases:

  • Races: scarce bookings where a second matters.
  • High volume: repeated page work where one saved round trip becomes many.
  • Chat UX: a human waiting for a browser task to finish.

The important design constraint is not to hide uncertainty. Composite actions should be used for steps the agent already knows it wants to take. If the page state is ambiguous, observe first. Batching the wrong action only makes the wrong action faster.

That is the balance I want from browser-agent infrastructure: faster when the path is clear, more cautious when it is not.