Метка: Build the web for agents

  • Building the Web for Agents, Not Agents for the Web: A New Paradigm for AI Web Interaction

    Build the web for agents, not agents for the web
    Build the web for agents, not agents for the web

    The rise of Large Language Models (LLMs) and their multimodal counterparts has sparked a surge of interest in web agents—AI systems capable of autonomously navigating websites and completing complex tasks like booking flights, shopping, or managing emails. While this technology promises to revolutionize how we interact with the web, current approaches face fundamental challenges. Why? Because the web was designed for humans, not AI agents.

    In this blog post, we explore a visionary perspective from recent research advocating for a paradigm shift: instead of forcing AI agents to adapt to human-centric web interfaces, we should build the web specifically for agents. This new concept, called the Agentic Web Interface (AWI), aims to create safer, more efficient, and standardized environments tailored to AI capabilities.

    The Current Landscape: Web Agents Struggle with Human-Centric Interfaces

    Web agents today are designed to operate within the existing web ecosystem, which means interacting with:

    • Browser UIs: Agents process screenshots, Document Object Model (DOM) trees, or accessibility trees to understand web pages.
    • Web APIs: Some agents bypass the UI by calling APIs designed for developers rather than agents.

    Challenges Faced by Browser-Based Agents

    • Complex and Inefficient Representations:
      • Screenshots are visually rich but incomplete (hidden menus or dynamic content are missed).
      • DOM trees contain detailed page structure but are massive and noisy, often exceeding millions of tokens, making processing expensive and slow.
    • Resource Strain and Defensive Measures:
      • Automated browsing at scale can overload websites, leading to performance degradation for human users.
      • Websites respond with defenses like CAPTCHAs, which sometimes block legitimate agent use and create accessibility issues.
    • Safety and Privacy Risks:
      • Agents operating within browsers may access sensitive user data (passwords, payment info), raising concerns over misuse or accidental harm.

    Limitations of API-Based Agents

    • Narrow Action Space:
      APIs offer limited functionality compared to full UI interactions, often lacking stateful controls like sorting or filtering.
    • Developer-Centric Design:
      APIs are built for human developers, not autonomous agents, and may throttle or deny excessive requests.
    • Fallback to UI:
      When APIs cannot fulfill a task, agents must revert to interacting with the browser UI, inheriting its limitations.

    The Core Insight: The Web Is Built for Humans, Not Agents

    The fundamental problem is that web interfaces were designed for human users, with visual layouts, interactive elements, and workflows optimized for human cognition and behavior. AI agents, however, process information very differently and require interfaces that reflect their unique needs.

    Trying to force agents to operate within human-centric environments leads to inefficiency, high computational costs, and safety vulnerabilities.

    Introducing the Agentic Web Interface (AWI)

    The research proposes a bold new concept: designing web interfaces specifically for AI agents. The AWI would be a new layer or paradigm where websites expose information and controls in a way that is:

    • Efficient: Minimal and relevant information, avoiding the noise and overhead of full DOM trees or screenshots.
    • Safe: Built-in safeguards to protect user data and prevent malicious actions.
    • Standardized: Consistent formats and protocols to allow agents to generalize across different sites.
    • Transparent: Clear and auditable agent actions to build trust.
    • Expressive: Rich enough to support complex tasks and stateful interactions.
    • Collaborative: Designed with input from AI researchers, developers, and stakeholders to balance usability and security.

    Why AWI Matters: Benefits for All Stakeholders

    • For AI Agents:
      Agents can navigate and interact with websites more reliably and efficiently, reducing computational overhead and improving task success rates.
    • For Website Operators:
      Reduced server load and better control over agent behavior, minimizing the need for aggressive defenses like CAPTCHAs.
    • For Users:
      Safer interactions with AI agents that respect privacy and security, enabling trustworthy automation of web tasks.
    • For the AI Community:
      A standardized platform to innovate and build more capable, generalizable web agents.

    What Would AWI Look Like?

    While the paper does not prescribe a specific implementation, it envisions an interface that:

    • Provides structured, concise representations of page content tailored for agent consumption.
    • Supports declarative actions that agents can perform, such as clicking buttons, filling forms, or navigating pages, in a way that is unambiguous and verifiable.
    • Includes mechanisms for permissioning and auditing to ensure agents act within authorized boundaries.
    • Enables incremental updates to the interface as the page state changes, allowing agents to maintain situational awareness without reprocessing entire pages.

    The Road Ahead: Collaborative Effort Needed

    Designing and deploying AWIs will require:

    • Interdisciplinary collaboration: Web developers, AI researchers, security experts, and regulators must work together.
    • Community standards: Similar to how HTML and HTTP standardized web content and communication, AWI standards must emerge to enable broad adoption.
    • Iterative design and evaluation: Prototypes and experiments will be essential to balance agent needs with user safety and privacy.

    Conclusion: Building the Web for the Future of AI Agents

    The vision of the Agentic Web Interface challenges the status quo by asking us to rethink how web interactions are designed—not just for humans, but for intelligent agents that will increasingly automate our digital lives.

    By building the web for agents, we can unlock safer, more efficient, and more powerful AI-driven automation, benefiting users, developers, and the broader AI ecosystem.

    This paradigm shift calls for collective action from the machine learning community and beyond to create the next generation of web interfaces—ones that truly empower AI agents to thrive.

    Paper: https://arxiv.org/pdf/2506.10953

    If you’re interested in the future of AI and web interaction, stay tuned for more insights as researchers and developers explore this exciting frontier.