Close Menu
  • Home
  • AI
  • Big Data
  • Cloud Computing
  • iOS Development
  • IoT
  • IT/ Cybersecurity
  • Tech
    • Nanotechnology
    • Green Technology
    • Apple
    • Software Development
    • Software Engineering

Subscribe to Updates

Get the latest technology news from Bigteetechhub about IT, Cybersecurity and Big Data.

    What's Hot

    SGLA criticizes California Governor Newsom for signing ‘flawed, rushed’ sweepstakes ban

    October 13, 2025

    Gesture Recognition for Busy Hands

    October 13, 2025

    Inside the ‘Let’s Break It Down’ Series for Network Newbies

    October 13, 2025
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    Big Tee Tech Hub
    • Home
    • AI
    • Big Data
    • Cloud Computing
    • iOS Development
    • IoT
    • IT/ Cybersecurity
    • Tech
      • Nanotechnology
      • Green Technology
      • Apple
      • Software Development
      • Software Engineering
    Big Tee Tech Hub
    Home»Tech»OpenAI–Anthropic cross-tests expose jailbreak and misuse risks — what enterprises must add to GPT-5 evaluations
    Tech

    OpenAI–Anthropic cross-tests expose jailbreak and misuse risks — what enterprises must add to GPT-5 evaluations

    big tee tech hubBy big tee tech hubAugust 28, 2025005 Mins Read
    Share Facebook Twitter Pinterest Copy Link LinkedIn Tumblr Email Telegram WhatsApp
    Follow Us
    Google News Flipboard
    OpenAI–Anthropic cross-tests expose jailbreak and misuse risks — what enterprises must add to GPT-5 evaluations
    Share
    Facebook Twitter LinkedIn Pinterest Email Copy Link

    Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now


    OpenAI and Anthropic may often pit their foundation models against each other, but the two companies came together to evaluate each other’s public models to test alignment. 

    The companies said they believed that cross-evaluating accountability and safety would provide more transparency into what these powerful models could do, enabling enterprises to choose models that work best for them.

    “We believe this approach supports accountable and transparent evaluation, helping to ensure that each lab’s models continue to be tested against new and challenging scenarios,” OpenAI said in its findings. 

    Both companies found that reasoning models, such as OpenAI’s 03 and o4-mini and Claude 4 from Anthropic, resist jailbreaks, while general chat models like GPT-4.1 were susceptible to misuse. Evaluations like this can help enterprises identify the potential risks associated with these models, although it should be noted that GPT-5 is not part of the test. 


    AI Scaling Hits Its Limits

    Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:

    • Turning energy into a strategic advantage
    • Architecting efficient inference for real throughput gains
    • Unlocking competitive ROI with sustainable AI systems

    Secure your spot to stay ahead:


    These safety and transparency alignment evaluations follow claims by users, primarily of ChatGPT, that OpenAI’s models have fallen prey to sycophancy and become overly deferential. OpenAI has since rolled back updates that caused sycophancy. 

    “We are primarily interested in understanding model propensities for harmful action,” Anthropic said in its report. “We aim to understand the most concerning actions that these models might try to take when given the opportunity, rather than focusing on the real-world likelihood of such opportunities arising or the probability that these actions would be successfully completed.”

    OpenAI noted the tests were designed to show how models interact in an intentionally difficult environment. The scenarios they built are mostly edge cases.

    Reasoning models hold on to alignment 

    The tests covered only the publicly available models from both companies: Anthropic’s Claude 4 Opus and Claude 4 Sonnet, and OpenAI’s GPT-4o, GPT-4.1 o3 and o4-mini. Both companies relaxed the models’ external safeguards. 

    OpenAI tested the public APIs for Claude models and defaulted to using Claude 4’s reasoning capabilities. Anthropic said they did not use OpenAI’s o3-pro because it was “not compatible with the API that our tooling best supports.”

    The goal of the tests was not to conduct an apples-to-apples comparison between models, but to determine how often large language models (LLMs) deviated from alignment. Both companies leveraged the SHADE-Arena sabotage evaluation framework, which showed Claude models had higher success rates at subtle sabotage.

    “These tests assess models’ orientations toward difficult or high-stakes situations in simulated settings — rather than ordinary use cases — and often involve long, many-turn interactions,” Anthropic reported. “This kind of evaluation is becoming a significant focus for our alignment science team since it is likely to catch behaviors that are less likely to appear in ordinary pre-deployment testing with real users.”

    AD 4nXcLbwj56 JpaPR3ztrYp3mkSBrjZN8CbntqmxhHV9klub8Q YgqpjBaLFU0TWKeIRNVspEBVw6stibqyGCwGWEyJPuj oIHOyg2TU7Us3pdqoD77AjzaCO wsWx728JPDdYjsWRAA?key=4wIYfPcXmdrQx7 2DhlgA

    Anthropic said tests like these work better if organizations can compare notes, “since designing these scenarios involves an enormous number of degrees of freedom. No single research team can explore the full space of productive evaluation ideas alone.”

    The findings showed that generally, reasoning models performed robustly and can resist jailbreaking. OpenAI’s o3 was better aligned than Claude 4 Opus, but o4-mini along with GPT-4o and GPT-4.1 “often looked somewhat more concerning than either Claude model.”

    GPT-4o, GPT-4.1 and o4-mini also showed willingness to cooperate with human misuse and gave detailed instructions on how to create drugs, develop bioweapons and scarily, plan terrorist attacks. Both Claude models had higher rates of refusals, meaning the models refused to answer queries it did not know the answers to, to avoid hallucinations.

    AD 4nXdIMk3CDWGGRTRYsfMVQ5UoBYr4OB3uyjS YI0dgJko5xoAj7w CkozVZw6B3s9vPO8ER4 MiHF79zDw8QDIPfTggwdYubDHkBrXQW ZYAuF3UZGybsIQ4F5yzB5e6B2yj2ZT7kTg?key=4wIYfPcXmdrQx7 2DhlgA

    Models from companies showed “concerning forms of sycophancy” and, at some point, validated harmful decisions of simulated users. 

    What enterprises should know

    For enterprises, understanding the potential risks associated with models is invaluable. Model evaluations have become almost de rigueur for many organizations, with many testing and benchmarking frameworks now available. 

    Enterprises should continue to evaluate any model they use, and with GPT-5’s release, should keep in mind these guidelines to run their own safety evaluations:

    • Test both reasoning and non-reasoning models, because, while reasoning models showed greater resistance to misuse, they could still offer up hallucinations or other harmful behavior.
    • Benchmark across vendors since models failed at different metrics.
    • Stress test for misuse and syconphancy, and score both the refusal and the utility of those refuse to show the trade-offs between usefulness and guardrails.
    • Continue to audit models even after deployment.

    While many evaluations focus on performance, third-party safety alignment tests do exist. For example, this one from Cyata. Last year, OpenAI released an alignment teaching method for its models called Rules-Based Rewards, while Anthropic launched auditing agents to check model safety. 

    Daily insights on business use cases with VB Daily

    If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

    Read our Privacy Policy

    Thanks for subscribing. Check out more VB newsletters here.

    An error occured.

    vb daily phone



    Source link
    Add crosstests enterprises Evaluations expose GPT5 jailbreak misuse OpenAIAnthropic Risks
    Follow on Google News Follow on Flipboard
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
    tonirufai
    big tee tech hub
    • Website

    Related Posts

    SGLA criticizes California Governor Newsom for signing ‘flawed, rushed’ sweepstakes ban

    October 13, 2025

    The Download: Our bodies’ memories, and Traton’s electric trucks

    October 12, 2025

    Apple says goodbye to the Clips app

    October 12, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Editors Picks

    SGLA criticizes California Governor Newsom for signing ‘flawed, rushed’ sweepstakes ban

    October 13, 2025

    Gesture Recognition for Busy Hands

    October 13, 2025

    Inside the ‘Let’s Break It Down’ Series for Network Newbies

    October 13, 2025

    SVS Engineers: Who are the people that test-drive your network?

    October 12, 2025
    Advertisement
    About Us
    About Us

    Welcome To big tee tech hub. Big tee tech hub is a Professional seo tools Platform. Here we will provide you only interesting content, which you will like very much. We’re dedicated to providing you the best of seo tools, with a focus on dependability and tools. We’re working to turn our passion for seo tools into a booming online website. We hope you enjoy our seo tools as much as we enjoy offering them to you.

    Don't Miss!

    SGLA criticizes California Governor Newsom for signing ‘flawed, rushed’ sweepstakes ban

    October 13, 2025

    Gesture Recognition for Busy Hands

    October 13, 2025

    Subscribe to Updates

    Get the latest technology news from Bigteetechhub about IT, Cybersecurity and Big Data.

      • About Us
      • Contact Us
      • Disclaimer
      • Privacy Policy
      • Terms and Conditions
      © 2025 bigteetechhub.All Right Reserved

      Type above and press Enter to search. Press Esc to cancel.