
Patronus AI has announced Generative Simulators, which are simulation environments that can create new tasks and scenarios, update the rules of the world over time, and evaluate an agent’s actions as it learns.
According to the company, as AI systems move from answering single questions to executing multi-step workflows, the static tests and training data that have been used are no longer dynamic enough to reflect real-world systems. “Agents that look strong on static benchmarks can stumble when requirements change mid-task, when they must use tools correctly, or when they need to stay on track over longer periods of time,” the company explained in an announcement.
Generative Simulators address this by generating the assignment, the surrounding conditions, and the checking process, and then adapt those as the agent works.
“In other words, instead of a fixed set of test questions, it’s a living practice world that can keep producing new, relevant challenges and feedback,” the company explained.
Task generation, world tooling, and reward modeling can be made more difficult individually or jointly, helping to scale the difficulty for problematic areas of the model. Additionally, the domain specificity can be modified by adding, removing, or swapping out toolsets. For example, a browser use toolset can be added to an SWE-Bench task to extend it to frontend development tasks when the agent needs to debug visually using browser tools.
These simulators are at the heart of the company’s RL Environments, which are training environments where agents learn through trial and error in settings that mimic human workflows. Each environment includes domain-specific rules, best practices, and verifiable rewards that guide agents while also exposing them to realistic interruptions and challenges.
The company also announced a new training method called Open Recursive Self-Improvement (ORSI) that allows agents to improve through interaction and feedback without requiring a full retraining cycle between attempts.
“Traditional benchmarks measure isolated capabilities, but they miss the interruptions, context switches, and multi-layered decision-making that define actual work,” said Anand Kannappan, CEO and co-founder of Patronus AI. “For agents to perform tasks at human-comparable levels, they need to learn the way humans do – through dynamic, feedback-driven experience that captures real-world nuance.”
