How to make LLMs predictable for regulated industries

This title was summarized by AI from the post below.

2mo

Making Randomness Predictable A defining characteristic of large language models is 'creativity'. Using the LLM with a constant "temperature" parameter, generates variable output completions. And this is what makes LLMs appear creative. Often described as a feature. However in regulated industries this can be a showstopper. Regulated applications require predictability. They need to be designed, tested and validated. Outputs need to be reproduced. And this increases end user confidence when used for making clinical decisions or providing investment advice. To date, it has not been possible to ensure LLM outputs are consistently reproducible. Primarily because the underlying root cause for the randomness was not identified. Until now. A recent paper from Thinking Machines (https://lnkd.in/eax7ybuk) now proposes the cause of the randomness. And proposes an approach to mitigate the behavior. And the cause appears to be rooted in the system kernel. A component underpinning the AI system. Which can be reconfigured more easily than the LLM. So what does this mean? This means we now have a possible path to validate generative AI solutions. This is a huge step forward and brings the promise of generative AI even closer to enterprise production, for use in sensitive regulated industries. Citation He, Horace and Thinking Machines Lab, "Defeating Nondeterminism in LLM Inference", Thinking Machines Lab: Connectionism, Sep 2025.

Defeating Nondeterminism in LLM Inference thinkingmachines.ai

To view or add a comment, sign in

More Relevant Posts

Marty Smith
2mo
Report this post
I have to say, this is one of the best articles I’ve seen on the constructs of AI. (I have repeatedly stated, try running the same prompt multiple times and you will get a different response every time). Having worked across dozens of platforms, operating systems, processors, and languages, this one just smacks you in the face. It really makes you stop and ask, what was the real objective here? Because it sure doesn’t look like it matches the expectation of how AI is actually being used. If you’re running anything critical on AI and you don’t have some form of post-response validation in place (and no, not AI checking itself given what’s in this article), you might want to rethink your whole approach. https://lnkd.in/eDuB2rPy

Defeating Nondeterminism in LLM Inference thinkingmachines.ai
Like Comment
To view or add a comment, sign in
Bhavesh Kumar
2mo
Report this post
Imagine asking the same person the same question twice, but getting slightly different answers Thinking Machines is a team with deep roots in creating some of the most transformative AI systems we know today — ChatGPT, Character.ai, and other leading platforms. They’ve now launched a brand-new research group, and their first blog post is already tackling a question that has puzzled both researchers and practitioners: 👉 Why do large language models sometimes give different answers, even when we set temperature to zero? For many, the assumption has been simple: “temperature=0” should mean deterministic output — the same prompt, the same answer, every time. Yet in practice, that hasn’t always been true. 🔍 In their blog, the Thinking Machines team dives deep into this issue of nondeterminism in LLM inference and uncovers what’s really going on: » Floating point math is non-associative. Even tiny changes in the order of addition or multiplication can cause slightly different results. » Parallel computing on GPUs makes this worse. Operations may not always run in the same sequence, producing small but noticeable output differences. » Inference engines add complexity. Optimizations like kernel fusion, graph compilers, and distributed execution can make the math path non-reproducible. » Tie-breaking between equally likely tokens isn’t always consistent. This can lead to diverging outputs, even with identical settings. 💡 What’s powerful about this work is not just the diagnosis, but the solutions they propose. The team shows how to configure inference so that outputs really are deterministic: » Using deterministic kernels and carefully chosen compute libraries. » Controlling seeds and ensuring fixed tie-breaking behavior. » Structuring execution so floating-point order is preserved. This matters because scientific reproducibility and engineering trust depend on being able to reproduce results exactly. In research, we want to confirm findings without hidden randomness. In production, companies want users to have consistent, reliable interactions with AI systems. ✨ With this first blog, Thinking Machines has shown what their new research direction is all about: digging into hard, foundational issues in AI systems, and pushing for solutions that make large-scale models more reliable, transparent, and useful. 📖 You can read the full blog here: https://lnkd.in/e38jfiJt

Defeating Nondeterminism in LLM Inference thinkingmachines.ai
Like Comment
To view or add a comment, sign in
Sergii Kavun
2mo
Report this post
Thinking Machines Lab, the AI startup founded by former OpenAI CTO Mira Murati, has published research claiming to solve one of the most persistent problems in large language models: nondeterministic inference results. The Problem: Current LLMs produce different outputs even when given identical inputs at temperature 0, which should theoretically be deterministic. The team identified that this goes beyond typical floating-point arithmetic issues - the root cause is lack of "batch invariance" in widely-used inference kernels. What is Batch Invariance? A model should produce identical outputs for the same input regardless of batch size or how inputs are grouped together. Current systems fail this test because operations like matrix multiplication, attention mechanisms, and normalization change their computational strategies based on batch configuration, introducing small numerical differences that compound over long text generations. Their Solution: The team developed custom batch-invariant kernels for critical operations including: RMSNorm (Root Mean Square Normalization) Matrix multiplication (matmul) Attention mechanisms Results: Testing on the Qwen-3-8B model revealed striking improvements: Before: 1,000 identical prompts at temperature 0 produced 80 different completions After: All 1,000 runs produced identical results, achieving perfect reproducibility The Trade-off: The batch-invariant approach runs slower than standard inference, but the researchers argue this performance cost is worthwhile for applications requiring deterministic behavior, particularly in research, safety testing, and debugging scenarios. Why This Matters: As the team notes in their blog post: "Reproducibility is a bedrock of scientific progress. However, it's remarkably difficult to get reproducible results out of large language models." This work could influence the design of future inference engines where determinism becomes as important as raw performance. Source Connection Link: https://lnkd.in/dCq4T7jC Discussion: What are your thoughts on trading inference speed for deterministic outputs? Could this approach become standard for production LLM deployments where consistency is critical? Note: This addresses a fundamental issue that affects LLM reliability across the industry. The technical approach of focusing on batch invariance rather than just floating-point precision represents a novel angle on solving nondeterminism. #ai #artificialinteligence #datascience #ds #thinkingmachines

Defeating Nondeterminism in LLM Inference thinkingmachines.ai
Like Comment
To view or add a comment, sign in
Arturo Cardenas-Blanco
1mo
Report this post
The Subtle Randomness of AI: Why Identical Prompts Yield Different Answers Ever passed the same prompt to a LLM — with temperature = 0 — and got different results? That’s not your imagination. It’s a subtle but critical problem in modern inference pipelines: nondeterminism. Horace He and Thinking Machines Lab just published an excellent piece explaining why this happens, and also how to address this issue. The usual suspects (floating-point rounding, concurrency) only tell part of the story. The real issue lies in batch invariance: how dynamic batching can alter the numerical path of computations, making outputs depend on how requests are grouped, not on their content. Their solution? Re-engineering key kernels — RMSNorm, MatMul, Attention — to become batch-invariant, ensuring bit-for-bit reproducibility even under load. Expected result: the same input, the same output — every single time. Why it matters: - Determinism improves debuggability and safety. - It aligns training and inference behaviour. - It helps build trustworthy agentic systems and reproducible research. This is a quiet but foundational step toward reliable AI infrastructure; something that matters far more than hype. Read the original post here: https://lnkd.in/ghSEM5Bd #AI #LLM #MachineLearning #Reproducibility #AgenticAI #IdeasArtificiales

Defeating Nondeterminism in LLM Inference thinkingmachines.ai
Like Comment
To view or add a comment, sign in
Oded Oron
2mo
Report this post
New blog from Thinking Machines (Mira Murati’s new company) about defeating nondeterminism in LLM inference. Deterministic LLMs — models that always return the same intended answer 🤗🫡 This could completely change the rules of AI 👑 https://lnkd.in/dvPtwGVq

Defeating Nondeterminism in LLM Inference thinkingmachines.ai
Like Comment
To view or add a comment, sign in
The Yoda scrolls

76 followers
2mo
Report this post
Defeating Nondeterminism in LLM Inference One of the biggest frustrations with large language models is how the same prompt can give different results—even when you set everything to “deterministic.” Thinking Machines just shared a deep dive into this issue and proposed solutions that make outputs truly reproducible. Their approach tackles the hidden sources of nondeterminism (like batch size variance) and introduces batch-invariant kernels to ensure bit-identical results across runs. This is a big step forward. Deterministic outputs mean more reliable debugging, safer deployment, and less technical debt for teams building real-world AI systems. Read more: https://lnkd.in/gKHbbJ_y

Defeating Nondeterminism in LLM Inference thinkingmachines.ai
Like Comment
To view or add a comment, sign in
Jaspreet Singh
2mo Edited
Report this post
💡 Ever wonder why AI models like GPT-4 sometimes give different answers to the same question? It’s not just random noise—there’s a fascinating reason behind it called nondeterminism! The team at Thinking Machines uncovered a hidden culprit: floating-point math in computers. Even simple calculations can vary slightly depending on the order they’re done in. For example, (0.1 + 1e20) - 1e20 might give 0, while 0.1 + (1e20 - 1e20) gives 0.1. Tiny differences, big impact! But here’s the real twist: nondeterminism in AI models comes from how they process multiple user requests at once (in “batches”). If your question is handled in a different batch, the output can change, leading to inconsistent results. This can affect tasks like coding or complex reasoning where consistency is key. The good news? The team is working on a fix to make AI models more consistent by tweaking how these calculations are handled, aiming to reduce nondeterminism. A must-know for anyone curious about the inner workings of AI! You can explore the research blog here: https://lnkd.in/gwsqFqHu #AI #MachineLearning #Tech #Innovation #DeepLearning

Defeating Nondeterminism in LLM Inference thinkingmachines.ai
Like Comment
To view or add a comment, sign in
Eero Olli (PhD)
2mo
Report this post
One of the major problems with AI is that it is giving different answers to the same question. This is important for trust, responsibility and many other issues when we consider how to manage, monitor and legislate AIs. For most people it does not make much of a difference, but if you want to apply AI to research, medicine, law, or some other field where a precise answer is needed, you need the AI created answers to be reproducible. How do you know which of the answers is correct, if they are a little different? There is a brand new, ground breaking research paper from the Thinking Machines Lab that comes up with a solution to some of these problems. They are pointing out how the consistency breaks down because (a+b) + c ≠ a + (b+c) when we are calculating with floating point numbers. If you did not get the mathematics, it is OK. Keep reading. They demonstrate a solution and show in the end of the paper experiments where a LLM writes the same text 1000 times. This will be one of the game changers when we look back on the development of AI. #AI #trustworthyAI #AIconsistency #research https://lnkd.in/epFtdXkF

Defeating Nondeterminism in LLM Inference thinkingmachines.ai
Like Comment
To view or add a comment, sign in
Gilberto Pastorella
2mo
Report this post
If you want to know where AI is and what to expect when you build AI systems, you should read these 2 articles from Anthropic and Thinking Machines Lab: https://lnkd.in/dN32aad9 https://lnkd.in/dvHNsm_k Skip all the very technical stuff, focus on the key messages. PS: you can give the articles' links to ChatGPT or Claude or whatever tool you are using and give the easy prompt "I am a non technical manager. Highlight in easy terms the issues about current AI systems highlighted in these two articles".

Defeating Nondeterminism in LLM Inference thinkingmachines.ai
Like Comment
To view or add a comment, sign in
Sufian Abu
2mo
Report this post
𝗘𝘃𝗲𝗿 𝘄𝗼𝗻𝗱𝗲𝗿𝗲𝗱 𝘄𝗵𝘆 𝗟𝗟𝗠𝘀 𝗰𝗮𝗻 𝗴𝗶𝘃𝗲 𝗱𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝘁 𝗮𝗻𝘀𝘄𝗲𝗿𝘀 𝗲𝘃𝗲𝗻 𝘄𝗶𝘁𝗵 𝘁𝗵𝗲 𝘁𝗲𝗺𝗽𝗲𝗿𝗮𝘁𝘂𝗿𝗲 𝘀𝗲𝘁 𝘁𝗼 𝟬? A must-read for all AI/GenAI practitioners. It turns out the culprit isn’t just “GPU randomness.” 𝗧𝗵𝗲 𝗿𝗲𝗮𝗹 𝗶𝘀𝘀𝘂𝗲 𝗹𝗶𝗲𝘀 𝗶𝗻 𝗵𝗼𝘄 𝗯𝗮𝘁𝗰𝗵 𝘀𝗶𝘇𝗲𝘀 𝗰𝗵𝗮𝗻𝗴𝗲 𝘁𝗵𝗲 𝗺𝗮𝘁𝗵 𝗼𝗿𝗱𝗲𝗿 𝗶𝗻𝘀𝗶𝗱𝗲 𝗸𝗲𝘆 𝗼𝗽𝗲𝗿𝗮𝘁𝗶𝗼𝗻𝘀 𝗹𝗶𝗸𝗲 𝗥𝗠𝗦𝗡𝗼𝗿𝗺, 𝗠𝗮𝘁𝗺𝘂𝗹, 𝗮𝗻𝗱 𝗔𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻. The team at Thinking Machines breaks down how to defeat nondeterminism in LLM inference — and why making kernels batch-invariant is the key to consistent outputs. https://lnkd.in/gjGn_8_p

Defeating Nondeterminism in LLM Inference thinkingmachines.ai
Like Comment
To view or add a comment, sign in

4,125 followers

View Profile Connect

How to make LLMs predictable for regulated industries

More from this author

The Future of Clinical Trials: Embracing Standardization and Automation

Unify to Simplify

Explore content categories

How to make LLMs predictable for regulated industries

More Relevant Posts

More from this author

The Future of Clinical Trials: Embracing Standardization and Automation

Unify to Simplify

Explore related topics

Explore content categories