Foundry

Foundry is an enterprise platform that provides simulation, evaluation, data, and reinforcement learning infrastructure for building and improving AI web agents. Backed by Y Combinator (F24) and founded by former Scale AI engineers Manil Lakabi and Pranav Raja, Foundry enables organizations to develop browser-based AI agents that can navigate real enterprise software platforms, handle complex multi-step workflows, and operate reliably without the usual challenges of web drift, IP bans, or rate limits.

Key Capabilities

Simulation: Pixel-perfect, reproducible browser environments that eliminate drift, noise, and rate limits, allowing agents to be tested consistently across thousands of scenarios without depending on live external services. This makes it possible to run agents repeatedly in identical conditions, which is essential for reliable evaluation and regression testing.
Evaluation: Every agent action is tracked, classified, and tagged. Click failures, layout shifts, and misfires are surfaced immediately, giving teams visibility into exactly where and why agents fail. Evaluation covers multiple dimensions including factuality, tool use, tone, creativity, safety, and relevance.
Data: Expert annotators generate custom, long-horizon datasets for supervised fine-tuning of browser agents on real enterprise platforms such as Gmail, Salesforce, and LinkedIn. These datasets capture realistic multi-step workflows that are difficult to source through automated means.
Reinforcement Learning: Safely sample and evaluate unlimited trajectories, enabling teams to train browser agents at scale without anti-bot constraints or production risks. This RL infrastructure allows agents to learn from exploration in simulated environments before deployment.

How It Works

Foundry provides a Python SDK that integrates into existing agent workflows. Teams define tasks, spin up reproducible browser environments via Foundry's Agent Web Engine (AWE), run their agents against those environments, and compare results against ground truth. The platform records every event, state mutation, and agent action for detailed analysis. Setup takes approximately five minutes, and evaluations can run up to ten times faster than traditional manual testing approaches.

Use Cases

Enterprise SaaS Automation: Build and test agents that navigate complex enterprise software like Salesforce, Google Sheets, and CRM platforms for tasks such as lead enrichment, data entry, and report generation. Agents can handle login credentials, payment details, and multi-step processes across different SaaS applications.
Agent Benchmarking and Research: The SDRBench benchmark provides 50 deterministic tasks simulating realistic sales development workflows, enabling reproducible evaluation and leaderboard-ready baselines for browser agent research on cross-application state persistence and error recovery.
Pre-deployment Quality Assurance: Run agents through thousands of simulated scenarios before production deployment to catch regressions in factuality, tool use, tone, and task completion accuracy before they impact end users.

Integration

Teams integrate Foundry through its Python SDK, which provides environment management, task definition, and CDP (Chrome DevTools Protocol) URL access for running agents. The platform supports existing agent frameworks and requires minimal changes to existing agent codebases. The evaluation loop follows a straightforward pattern: initialize an environment for each task, run the agent, capture the final state and events, and score results against ground truth.

Getting Started

Foundry is currently in private beta. Teams can apply for access through the Foundry website to unlock all platform features. The platform also offers a benchmark leaderboard for comparing agent performance and will provide sample evaluation datasets for local testing. Organizations interested in enterprise-grade agent infrastructure can contact the Foundry team directly through the website.

Categories

Overview

Key Capabilities

How It Works

Use Cases

Integration

Getting Started

Tool Overview

Pricing

Similar AI Tools

Stability AI Developer Platform

ChatGPT Code Interpreter

TeamPal

Automix

Remalt