
As a DIY researcher, I’ve spent countless hours trying to get LLM agents to navigate websites. It’s usually a mess. You feed the agent a massive DOM tree or a high-res screenshot, and the model struggles to “see” the button it needs to click. That’s because the web was built for eyes and fingers—not for neural networks.
I recently implemented the principles from the paper “Build the web for agents, not agents for the web” in my Istanbul lab. The authors argue for a paradigm shift: instead of making agents smarter at using human UIs, we should build Agentic Web Interfaces (AWIs). Here is how I reproduced this new way of thinking on my rig.
The Core Concept: The AWI Paradigm
Currently, an agent has to parse HTML, deal with pop-ups, and guess button functions. An AWI is a parallel, semantic version of a site designed for machine consumption. Think of it like an API on steroids—standardized, efficient, and direct.
To test this, I built a local mock-up of a Turkish e-commerce site and created an AWI layer. On my dual RTX 4080setup, I compared how an agent performs on the “Visual UI” vs. the “Agentic UI.”
The Implementation: Standardizing the Action Space
On my Ubuntu workstation, I used one GPU to run the “Site Environment” and the other to run the “Agent.” By serving the agent a simplified, JSON-based semantic map of the page (the AWI) instead of raw HTML, I drastically reduced the input token count.
Python
# Traditional Approach (Human UI)
# Input: 50,000 tokens of messy HTML/CSS
# Output: "I think the 'Buy' button is at (x,y)..."
# Agentic Web Interface (AWI) Approach
# Input: 400 tokens of structured semantic data
# {
# "actionable_elements": [
# {"id": "purchase_btn", "type": "button", "purpose": "add_to_cart"},
# {"id": "qty_input", "type": "number", "default": 1}
# ]
# }
# On my rig, this reduced inference latency by 70%
Challenges: The Safety-Efficiency Balance
The paper lists Safety as a guiding principle. When agents interact with AWIs, they are fast. Too fast. In my local tests, an agent could accidentally place 100 orders in seconds if the interface didn’t have “Human-in-the-Loop” guardrails.
My Fix: I implemented a “Commitment Layer” where the AWI requires a manual signature from my phone for any transaction over 50 TL. This mirrors the paper’s call for Human-Centric AI where the user stays in control of the agent’s agency.
Lab Results: Efficiency Gains
By moving from a “Human-designed Browser” to an “Agent-designed Interface,” the performance metrics on my local hardware were night and day:
| Metric | Human UI (Baseline) | Agentic Web Interface (AWI) |
| Token Usage/Task | ~120,000 | ~4,500 |
| Task Success Rate | 62% | 98% |
| Compute Cost (VRAM) | 14.2 GB | 4.8 GB |
Export to Sheets
AGI: A Web of Machines
If we want AGI to be truly useful, it needs a “digital world” it can actually inhabit. The current web is like a forest with no trails; AWIs are the highways. By reproducing this paper, I’ve seen that the future of the internet isn’t just better websites for us—it’s a secondary, invisible layer where our agents can collaborate, trade, and navigate with perfect precision.