Close Menu
  • Home
  • AI
  • Big Data
  • Cloud Computing
  • iOS Development
  • IoT
  • IT/ Cybersecurity
  • Tech
    • Nanotechnology
    • Green Technology
    • Apple
    • Software Development
    • Software Engineering

Subscribe to Updates

Get the latest technology news from Bigteetechhub about IT, Cybersecurity and Big Data.

    What's Hot

    Training a Model on Multiple GPUs with Data Parallelism

    December 28, 2025

    3D-Printed Cinema Film Camera Oozes Vintage Vibes

    December 28, 2025

    Probing the fundamental nature of the Higgs Boson – Physics World

    December 28, 2025
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    Big Tee Tech Hub
    • Home
    • AI
    • Big Data
    • Cloud Computing
    • iOS Development
    • IoT
    • IT/ Cybersecurity
    • Tech
      • Nanotechnology
      • Green Technology
      • Apple
      • Software Development
      • Software Engineering
    Big Tee Tech Hub
    Home»Tech»Less is more: UC Berkeley and Google unlock LLM potential through simple sampling
    Tech

    Less is more: UC Berkeley and Google unlock LLM potential through simple sampling

    big tee tech hubBy big tee tech hubMarch 22, 20250536 Mins Read
    Share Facebook Twitter Pinterest Copy Link LinkedIn Tumblr Email Telegram WhatsApp
    Follow Us
    Google News Flipboard
    Less is more: UC Berkeley and Google unlock LLM potential through simple sampling
    Share
    Facebook Twitter LinkedIn Pinterest Email Copy Link

    Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


    A new paper by researchers from Google Research and the University of California, Berkeley, demonstrates that a surprisingly simple test-time scaling approach can boost the reasoning abilities of large language models (LLMs). The key? Scaling up sampling-based search, a technique that relies on generating multiple responses and using the model itself to verify them. 

    The core finding is that even a minimalist implementation of sampling-based search, using random sampling and self-verification, can elevate the reasoning performance of models like Gemini 1.5 Pro beyond that of o1-Preview on popular benchmarks. The findings can have important implications for enterprise applications and challenge the assumption that highly specialized training or complex architectures are always necessary for achieving top-tier performance.

    The limits of current test-time compute scaling

    The current popular method for test-time scaling in LLMs is to train the model through reinforcement learning to generate longer responses with chain-of-thought (CoT) traces. This approach is used in models such as OpenAI o1 and DeepSeek-R1. While beneficial, these methods usually require substantial investment in the training phase.

    Another test-time scaling method is “self-consistency,” where the model generates multiple responses to the query and chooses the answer that appears more often. Self-consistency reaches its limits when handling complex problems, as in these cases, the most repeated answer is not necessarily the correct one.

    Sampling-based search offers a simpler and highly scalable alternative to test-time scaling: Let the model generate multiple responses and select the best one through a verification mechanism. Sampling-based search can complement other test-time compute scaling strategies and, as the researchers write in their paper, “it also has the unique advantage of being embarrassingly parallel and allowing for arbitrarily scaling: simply sample more responses.”

    More importantly, sampling-based search can be applied to any LLM, including those that have not been explicitly trained for reasoning.

    How sampling-based search works

    The researchers focus on a minimalist implementation of sampling-based search, using a language model to both generate candidate responses and verify them. This is a “self-verification” process, where the model assesses its own outputs without relying on external ground-truth answers or symbolic verification systems.

    Search-based sampling
    Search-based sampling Credit: VentureBeat

    The algorithm works in a few simple steps: 

    1—The algorithm begins by generating a set of candidate solutions to the given problem using a language model. This is done by giving the model the same prompt multiple times and using a non-zero temperature setting to create a diverse set of responses.

    2—Each candidate’s response undergoes a verification process in which the LLM is prompted multiple times to determine whether the response is correct. The verification outcomes are then averaged to create a final verification score for the response.

    3— The algorithm selects the highest-scored response as the final answer. If multiple candidates are within close range of each other, the LLM is prompted to compare them pairwise and choose the best one. The response that wins the most pairwise comparisons is chosen as the final answer.

    The researchers considered two key axes for test-time scaling:

    Sampling: The number of responses the model generates for each input problem.

    Verification: The number of verification scores computed for each generated solution

    How sampling-based search compares to other techniques

    The study revealed that reasoning performance continues to improve with sampling-based search, even when test-time compute is scaled far beyond the point where self-consistency saturates. 

    At a sufficient scale, this minimalist implementation significantly boosts reasoning accuracy on reasoning benchmarks like AIME and MATH. For example, Gemini 1.5 Pro’s performance surpassed that of o1-Preview, which has explicitly been trained on reasoning problems, and Gemini 1.5 Flash surpassed Gemini 1.5 Pro.

    image 292ff6

    “This not only highlights the importance of sampling-based search for scaling capability, but also suggests the utility of sampling-based search as a simple baseline on which to compare other test-time compute scaling strategies and measure genuine improvements in models’ search capabilities,” the researchers write.

    It is worth noting that while the results of search-based sampling are impressive, the costs can also become prohibitive. For example, with 200 samples and 50 verification steps per sample, a query from AIME will generate around 130 million tokens, which costs $650 with Gemini 1.5 Pro. However, this is a very minimalistic approach to sampling-based search, and it is compatible with optimization techniques proposed in other studies. With smarter sampling and verification methods, the inference costs can be reduced considerably by using smaller models and generating fewer tokens. For example, by using Gemini 1.5 Flash to perform the verification, the costs drop to $12 per question.

    Effective self-verification strategies

    There is an ongoing debate on whether LLMs can verify their own answers. The researchers identified two key strategies for improving self-verification using test-time compute:

    Directly comparing response candidates: Disagreements between candidate solutions strongly indicate potential errors. By providing the verifier with multiple responses to compare, the model can better identify mistakes and hallucinations, addressing a core weakness of LLMs. The researchers describe this as an instance of “implicit scaling.”

    Task-specific rewriting: The researchers propose that the optimal output style of an LLM depends on the task. Chain-of-thought is effective for solving reasoning tasks, but responses are easier to verify when written in a more formal, mathematically conventional style. Verifiers can rewrite candidate responses into a more structured format (e.g., theorem-lemma-proof) before evaluation.

    “We anticipate model self-verification capabilities to rapidly improve in the short term, as models learn to leverage the principles of implicit scaling and output style suitability, and drive improved scaling rates for sampling-based search,” the researchers write.

    Implications for real-world applications

    The study demonstrates that a relatively simple technique can achieve impressive results, potentially reducing the need for complex and costly model architectures or training regimes.

    This is also a scalable technique, enabling enterprises to increase performance by allocating more compute resources to sampling and verification. It also enables developers to push frontier language models beyond their limitations on complex tasks.

    “Given that it complements other test-time compute scaling strategies, is parallelizable and allows for arbitrarily scaling, and admits simple implementations that are demonstrably effective, we expect sampling-based search to play a crucial role as language models are tasked with solving increasingly complex problems with increasingly large compute budgets,” the researchers write. 

    Daily insights on business use cases with VB Daily

    If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

    Read our Privacy Policy

    Thanks for subscribing. Check out more VB newsletters here.

    An error occured.

    vb daily phone



    Source link
    Berkeley Google LLM potential sampling simple unlock
    Follow on Google News Follow on Flipboard
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
    tonirufai
    big tee tech hub
    • Website

    Related Posts

    MIT Technology Review’s most popular stories of 2025

    December 28, 2025

    FaZe Clan’s future is uncertain after influencers depart

    December 27, 2025

    ServiceNow has spent $12B+ on acquisitions and investments in 2025 amid concerns about revenue growth, projected to fall below 20% in 2026 without acquisitions (Brody Ford/Bloomberg)

    December 27, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Editors Picks

    Training a Model on Multiple GPUs with Data Parallelism

    December 28, 2025

    3D-Printed Cinema Film Camera Oozes Vintage Vibes

    December 28, 2025

    Probing the fundamental nature of the Higgs Boson – Physics World

    December 28, 2025

    MIT Technology Review’s most popular stories of 2025

    December 28, 2025
    About Us
    About Us

    Welcome To big tee tech hub. Big tee tech hub is a Professional seo tools Platform. Here we will provide you only interesting content, which you will like very much. We’re dedicated to providing you the best of seo tools, with a focus on dependability and tools. We’re working to turn our passion for seo tools into a booming online website. We hope you enjoy our seo tools as much as we enjoy offering them to you.

    Don't Miss!

    Training a Model on Multiple GPUs with Data Parallelism

    December 28, 2025

    3D-Printed Cinema Film Camera Oozes Vintage Vibes

    December 28, 2025

    Subscribe to Updates

    Get the latest technology news from Bigteetechhub about IT, Cybersecurity and Big Data.

      • About Us
      • Contact Us
      • Disclaimer
      • Privacy Policy
      • Terms and Conditions
      © 2025 bigteetechhub.All Right Reserved

      Type above and press Enter to search. Press Esc to cancel.