Close Menu
  • Home
  • AI
  • Big Data
  • Cloud Computing
  • iOS Development
  • IoT
  • IT/ Cybersecurity
  • Tech
    • Nanotechnology
    • Green Technology
    • Apple
    • Software Development
    • Software Engineering

Subscribe to Updates

Get the latest technology news from Bigteetechhub about IT, Cybersecurity and Big Data.

    What's Hot

    Now Microsoft’s Copilot Vision AI can scan everything on your screen

    July 16, 2025

    Technical Approaches and Practical Tradeoffs

    July 16, 2025

    Twilio’s Event Triggered Journeys, OutSystem’s Agent Workbench, and more – Daily News Digest

    July 16, 2025
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    Big Tee Tech Hub
    • Home
    • AI
    • Big Data
    • Cloud Computing
    • iOS Development
    • IoT
    • IT/ Cybersecurity
    • Tech
      • Nanotechnology
      • Green Technology
      • Apple
      • Software Development
      • Software Engineering
    Big Tee Tech Hub
    Home»IoT»Reasoning reimagined: Introducing Phi-4-mini-flash-reasoning | Microsoft Azure Blog
    IoT

    Reasoning reimagined: Introducing Phi-4-mini-flash-reasoning | Microsoft Azure Blog

    big tee tech hubBy big tee tech hubJuly 14, 2025005 Mins Read
    Share Facebook Twitter Pinterest Copy Link LinkedIn Tumblr Email Telegram WhatsApp
    Follow Us
    Google News Flipboard
    Reasoning reimagined: Introducing Phi-4-mini-flash-reasoning | Microsoft Azure Blog
    Share
    Facebook Twitter LinkedIn Pinterest Email Copy Link


    Unlock faster, efficient reasoning with Phi-4-mini-flash-reasoning—optimized for edge, mobile, and real-time applications.

    State of the art architecture redefines speed for reasoning models

    Microsoft is excited to unveil a new edition to the Phi model family: Phi-4-mini-flash-reasoning. Purpose-built for scenarios where compute, memory, and latency are tightly constrained, this new model is engineered to bring advanced reasoning capabilities to edge devices, mobile applications, and other resource-constrained environments. This new model follows Phi-4-mini, but is built on a new hybrid architecture, that achieves up to 10 times higher throughput and a 2 to 3 times average reduction in latency, enabling significantly faster inference without sacrificing reasoning performance. Ready to power real world solutions that demand efficiency and flexibility, Phi-4-mini-flash-reasoning is available on Azure AI Foundry, NVIDIA API Catalog, and Hugging Face today.

    A colorful background with a curved line

    Azure AI Foundry

    Create without boundaries—Azure AI Foundry has everything you need to design, customize, and manage AI applications and agents

    Efficiency without compromise 

    Phi-4-mini-flash-reasoning balances math reasoning ability with efficiency, making it potentially suitable for educational applications, real-time logic-based applications, and more. 

    Similar to its predecessor, Phi-4-mini-flash-reasoning is a 3.8 billion parameter open model optimized for advanced math reasoning. It supports a 64K token context length and is fine-tuned on high-quality synthetic data to deliver reliable, logic-intensive performance deployment.  

    What’s new?

    At the core of Phi-4-mini-flash-reasoning is the newly introduced decoder-hybrid-decoder architecture, SambaY, whose central innovation is the Gated Memory Unit (GMU), a simple yet effective mechanism for sharing representations between layers.  The architecture includes a self-decoder that combines Mamba (a State Space Model) and Sliding Window Attention (SWA), along with a single layer of full attention. The architecture also involves a cross-decoder that interleaves expensive cross-attention layers with the new, efficient GMUs. This new architecture with GMU modules drastically improves decoding efficiency, boosts long-context retrieval performance and enables the architecture to deliver exceptional performance across a wide range of tasks. 

    Key benefits of the SambaY architecture include: 

    • Enhanced decoding efficiency.
    • Preserves linear prefiling time complexity.
    • Increased scalability and enhanced long context performance.
    • Up to 10 times higher throughput.
    A diagram of a computer program
    Our decoder-hybrid-decoder architecture taking Samba [RLL+25] as the self-decoder. Gated Memory Units (GMUs) are interleaved with the cross-attention layers in the cross-decoder to reduce the decoding computation complexity. As in YOCO [SDZ+24], the full attention layer only computes the KV cache during the prefilling with the self-decoder, leading to linear computation complexity for the prefill stage.

    Phi-4-mini-flash-reasoning benchmarks 

    Like all models in the Phi family, Phi-4-mini-flash-reasoning is deployable on a single GPU, making it accessible for a broad range of use cases. However, what sets it apart is its architectural advantage. This new model achieves significantly lower latency and higher throughput compared to Phi-4-mini-reasoning, particularly in long-context generation and latency-sensitive reasoning tasks. 

    This makes Phi-4-mini-flash-reasoning a compelling option for developers and enterprises looking to deploy intelligent systems that require fast, scalable, and efficient reasoning—whether on premises or on-device. 

    A graph of a number of people
    A graph with red and blue dots and numbers
    The top plot shows inference latency as a function of generation length, while the bottom plot illustrates how inference latency varies with throughput. Both experiments were conducted using the vLLM inference framework on a single A100-80GB GPU with tensor parallelism (TP) set to 1.
    A graph of different colored bars
    A more accurate evaluation was used where Pass@1 accuracy is averaged over 64 samples for AIME24/25 and 8 samples for Math500 and GPQA Diamond. In this graph, Phi-4-mini-flash-reasoning outperforms Phi-4-mini-reasoning and is better than models twice its size.

    What are the potential use cases? 

    Thanks to its reduced latency, improved throughput, and focus on math reasoning, the model is ideal for: 

    • Adaptive learning platforms, where real-time feedback loops are essential.
    • On-device reasoning assistants, such as mobile study aids or edge-based logic agents.
    • Interactive tutoring systems that dynamically adjust content difficulty based on a learner’s performance.

    Its strength in math and structured reasoning makes it especially valuable for education technology, lightweight simulations, and automated assessment tools that require reliable logic inference with fast response times. 

    Developers are encouraged to connect with peers and Microsoft engineers through the Microsoft Developer Discord community to ask questions, share feedback, and explore real-world use cases together. 

    Microsoft’s commitment to trustworthy AI 

    Organizations across industries are leveraging Azure AI and Microsoft 365 Copilot capabilities to drive growth, increase productivity, and create value-added experiences. 

    We’re committed to helping organizations use and build AI that is trustworthy, meaning it is secure, private, and safe. We bring best practices and learnings from decades of researching and building AI products at scale to provide industry-leading commitments and capabilities that span our three pillars of security, privacy, and safety. Trustworthy AI is only possible when you combine our commitments, such as our Secure Future Initiative and our responsible AI principles, with our product capabilities to unlock AI transformation with confidence.  

    Phi models are developed in accordance with Microsoft AI principles: accountability, transparency, fairness, reliability and safety, privacy and security, and inclusiveness.  

    The Phi model family, including Phi-4-mini-flash-reasoning, employs a robust safety post-training strategy that integrates Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Reinforcement Learning from Human Feedback (RLHF). These techniques are applied using a combination of open-source and proprietary datasets, with a strong emphasis on ensuring helpfulness, minimizing harmful outputs, and addressing a broad range of safety categories. Developers are encouraged to apply responsible AI best practices tailored to their specific use cases and cultural contexts. 

    Read the model card to learn more about any risk and mitigation strategies.  

    Learn more about the new model 

    Create with Azure AI Foundry





    Source link

    Azure Blog Introducing Microsoft Phi4miniflashreasoning reasoning reimagined
    Follow on Google News Follow on Flipboard
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
    tonirufai
    big tee tech hub
    • Website

    Related Posts

    5 Ways Wi-Fi 7 Elevates the Guest Experience with Smart Hospitality

    July 15, 2025

    Using AWS IoT Device Management commands to simplify remote actions on IoT devices

    July 14, 2025

    Posit AI Blog: torch 0.10.0

    July 13, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Editors Picks

    Now Microsoft’s Copilot Vision AI can scan everything on your screen

    July 16, 2025

    Technical Approaches and Practical Tradeoffs

    July 16, 2025

    Twilio’s Event Triggered Journeys, OutSystem’s Agent Workbench, and more – Daily News Digest

    July 16, 2025

    Jacob Visovatti and Conner Goodrum on Testing ML Models for Enterprise Products – Software Engineering Radio

    July 16, 2025
    Advertisement
    About Us
    About Us

    Welcome To big tee tech hub. Big tee tech hub is a Professional seo tools Platform. Here we will provide you only interesting content, which you will like very much. We’re dedicated to providing you the best of seo tools, with a focus on dependability and tools. We’re working to turn our passion for seo tools into a booming online website. We hope you enjoy our seo tools as much as we enjoy offering them to you.

    Don't Miss!

    Now Microsoft’s Copilot Vision AI can scan everything on your screen

    July 16, 2025

    Technical Approaches and Practical Tradeoffs

    July 16, 2025

    Subscribe to Updates

    Get the latest technology news from Bigteetechhub about IT, Cybersecurity and Big Data.

      • About Us
      • Contact Us
      • Disclaimer
      • Privacy Policy
      • Terms and Conditions
      © 2025 bigteetechhub.All Right Reserved

      Type above and press Enter to search. Press Esc to cancel.