In today’s AI landscape, determinism and explainability are no longer nice-to-haves. They’re core requirements for any organization that wants to scale AI safely. Thinking Machines’ recent $2B raise and their public work on eliminating nondeterminism in LLM inference highlight how the industry is waking up to a simple truth: if you can’t explain or predict your AI’s behavior, you can’t trust it. At Aiceberg, we took that truth to heart from day one. We built a system that doesn’t rely on generative models to judge generative models. Our classification engine is deterministic, explainable, and grounded in real, labeled examples. That means organizations don’t have to guess why an AI system acted the way it did, they can know. In enterprise environments, where safety, compliance, and operational clarity are critical, this distinction matters. Explainability supports auditability. Determinism enables enforcement. Together, they build trust.If your AI stack can’t answer the question “Why did this happen?” it’s time to rethink the foundation. Aiceberg helps enterprises scale AI with confidence because what you can’t explain, you can’t control. #AIsecurity #XAI #AIAgents #LLM #compliance #Aiceberg #trustworthyAI #deterministicAI https://lnkd.in/gKHbbJ_y
Why Determinism and Explainability are Key for Safe AI Scaling
More Relevant Posts
-
Imagine running the same experiment twice and getting two different results. That’s what today’s AI looks like. Same input, different output. For chatbots, that’s fine. For drug discovery, clinical decisions, or regulatory filings? It’s a disaster. Reproducibility is the backbone of science. Without it, validation breaks down. Trust breaks down. Progress stalls. Mira Murati’s new Thinking Machines Lab is pushing for deterministic AI models that behave the same way every time. It’s the right goal. But reproducibility doesn’t stop at the model. 🔹 Data pipelines must be versioned 🔹 Environments must be controlled 🔹 Outputs must be governed, logged, auditable That’s why, at Manifold, we’ve built a platform where experiments can be run — and rerun — with confidence. Determinism isn’t just about the math. It’s about the system that surrounds it. Until AI can meet that standard, it doesn’t belong in critical science. 👉 How do you think the scientific community should balance AI’s creativity with the need for reproducibility? https://lnkd.in/gKHbbJ_y #AI #LifeSciences #Reproducibility #TrustedResearch #Manifold
To view or add a comment, sign in
-
The Subtle Randomness of AI: Why Identical Prompts Yield Different Answers Ever passed the same prompt to a LLM — with temperature = 0 — and got different results? That’s not your imagination. It’s a subtle but critical problem in modern inference pipelines: nondeterminism. Horace He and Thinking Machines Lab just published an excellent piece explaining why this happens, and also how to address this issue. The usual suspects (floating-point rounding, concurrency) only tell part of the story. The real issue lies in batch invariance: how dynamic batching can alter the numerical path of computations, making outputs depend on how requests are grouped, not on their content. Their solution? Re-engineering key kernels — RMSNorm, MatMul, Attention — to become batch-invariant, ensuring bit-for-bit reproducibility even under load. Expected result: the same input, the same output — every single time. Why it matters: - Determinism improves debuggability and safety. - It aligns training and inference behaviour. - It helps build trustworthy agentic systems and reproducible research. This is a quiet but foundational step toward reliable AI infrastructure; something that matters far more than hype. Read the original post here: https://lnkd.in/ghSEM5Bd #AI #LLM #MachineLearning #Reproducibility #AgenticAI #IdeasArtificiales
To view or add a comment, sign in
-
“Defeating Nondeterminism in LLM Inference” by Horace He and Thinking Machines Lab is a great read. With deterministic LLMs, it’s possible to unlock reliability, compliance, and robust system design across the financial world. https://lnkd.in/gfcp33Re #AI #GenAI #Finance #LLM #Compliance #Innovation
To view or add a comment, sign in
-
What often gets overlooked in AI is not just when systems fail, but when they quietly behave differently under the same conditions. A chatbot that answers one way today and another tomorrow, or a digital sales assistant that sometimes breaks the rule “never mention price first”, may look like small quirks to engineers but to customers, regulators, and business they signal risk. That’s why 𝗔𝗜 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻𝘀 (𝗲𝘃𝗮𝗹𝘀) 𝗺𝗮𝘁𝘁𝗲𝗿: they prove consistency at the micro level (same input, same output), the macro level (business guardrails always respected), and the multi level (agents coordinating predictably in complex systems). The Thinking Machines article capture this on defeating nondeterminism in LLM inference; a reminder that the real competitive edge isn’t just smarter AI, it’s #MakeAIPervasive you can trust, audit, and scale with confidence. 🔗 Defeating Nondeterminism in LLM Inference: https://lnkd.in/dC-8KXGE
To view or add a comment, sign in
-
Traditional ML models are deterministic at inference time. Meaning if you give it the same data, you get the same answer. Every time. This also means when they are wrong, they are consistently wrong. GenAI/LLMs have not worked that way. The same prompt does not always produce the same answer, every time. For a long time, we thought this was a decimal point problem. Turns out not! And even better, it’s solveable. This is a great post/paper on the details. https://lnkd.in/gkexCSX8 Now non-determinism at inference doesn’t make LLMs hallucinate less but it does make them act more consistently. Which is still helpful for detection and mitigation of those issues. #AI #GenAI #ML
To view or add a comment, sign in
-
✨ Imagine asking the same question one thousand times on ChatGPT… and getting one thousand identical answers. That might sound obvious, but it’s not what happens today. Even with “deterministic” settings, large language models often produce slightly different answers to the same question. The article Defeating Nondeterminism in LLM Inference explains why: tiny quirks in floating-point math, batching, and caching make outputs inconsistent. It’s a fascinating, step-by-step breakdown of why AI systems sometimes feel less predictable than we think. For practitioners, the real value is in how the article unpacks nondeterminism at every layer of inference: from floating-point non-associativity to GPU reduction ordering to cache-vs-no-cache discrepancies in attention. The proposed solution, batch-invariant kernels that enforce deterministic reduction order and cache alignment, directly tackles these issues. Their prototype in vLLM demonstrates that we don’t have to trade reproducibility for performance. The closing line stuck with me: “We reject this defeatism.” Too often, nondeterminism is accepted as the price of scale. This piece reframes reproducibility as a baseline requirement for trustworthy AI. Deterministic inference isn’t just about consistency, it’s about building AI systems we can debug, audit, and ultimately trust. A must-read for anyone working at the intersection of research and production. https://lnkd.in/gPxmFFFt #AI #MachineLearning #LLM #Reproducibility #MLOps #ArtificialIntelligence #DeepLearning
To view or add a comment, sign in
-
📖 A few good reads lately: ⚜️ Breakdown of the design of NotebookLM: https://lnkd.in/gJ-tMnNW. NotebookLM is one of my favourite AI products, and the thought that creators put into its design underlines the importance of thinking deeply about the user journey and interaction patterns when designing AI products. 🕵️ Definition of agent: https://lnkd.in/giST7UUb We may finally have a definition of 'agent' that is non-buzzwordy, non-cringe-worthy, and non-jargony, and I can get behind it. ✅ An LLM agent runs tools in a loop to achieve a goal ❌ Agents as human replacements because the features that still remain unique to humans are accountability and agency. 🎡 LLMs being non-deterministic had long been a challenge in designing applications with it. New research from Thinking Machines explores the possibility of defeating nondeterminism in LLM inference: https://lnkd.in/gcmgnikG. 🌟 Key takeaway: A commonly held view blames floating-point non-associativity and concurrency for nondeterminism. While floating-point math can yield minute differences, the main culprit is how inference workloads are batched. A user's result depends on how many other user requests are being handled and how those are grouped into batches, which is inherently variable and hidden from product-level logic. What have you read lately that you would recommend? #ai #machinelearning #LLM #notebookLM #agents
To view or add a comment, sign in
-
🤔 Ever asked ChatGPT the same question twice and gotten two different answers? 𝗚𝗲𝗻 𝗔𝗜 – It’s not just magic, it’s an incredible application of mathematics. The debate continues: is AI just a fad, or the next transformative shift? The answer often depends on our perspective, experiences, and exposure — especially for those not working with it day to day. But one frustration many people share: the outputs of large language models are non-deterministic. 👉 𝗘𝘃𝗲𝗻 𝘄𝗶𝘁𝗵 𝘁𝗵𝗲 𝘀𝗮𝗺𝗲 𝗽𝗿𝗼𝗺𝗽𝘁, 𝘆𝗼𝘂 𝗺𝗮𝘆 𝗻𝗼𝘁 𝗴𝗲𝘁 𝘁𝗵𝗲 𝘀𝗮𝗺𝗲 𝗮𝗻𝘀𝘄𝗲𝗿. That may soon change. Yes — that randomness might actually be fixable. 📄 A recent paper by Horace He and Thinking Machines Lab (Mira Murati’s next venture), “𝗗𝗲𝗳𝗲𝗮𝘁𝗶𝗻𝗴 𝗡𝗼𝗻𝗱𝗲𝘁𝗲𝗿𝗺𝗶𝗻𝗶𝘀𝗺 𝗶𝗻 𝗟𝗟𝗠 𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲” (𝗧𝗵𝗶𝗻𝗸𝗶𝗻𝗴 𝗠𝗮𝗰𝗵𝗶𝗻𝗲𝘀 𝗟𝗮𝗯: 𝗖𝗼𝗻𝗻𝗲𝗰𝘁𝗶𝗼𝗻𝗶𝘀𝗺, 𝗦𝗲𝗽 𝟮𝟬𝟮𝟱), digs into the root causes of this unpredictability. It explains how nondeterminism arises not only from concurrency and floating-point rounding, but also from server load and batching. Ironically, the same parallelism and concurrency that make modern systems fast and scalable also add unpredictability as a side effect. The authors propose batch-invariant kernels to ensure reproducibility — paving the way for LLM inference that is deterministic, reliable, and scientifically auditable. We may soon see these innovations integrated into mainstream LLM APIs, marking a huge step forward for trust, reproducibility, and enterprise adoption in Gen AI. 👉 Read here: https://lnkd.in/gEf7AihM #GenAI #Predictability #Reproducibility #LLM #AIResearch
To view or add a comment, sign in
-
Mira Murati, OpenAI’s former CTO, announced a breakthrough that solves Gen AI consistency after launching her new business with a $2B seed round. Mira and her team at Thinking Machines Lab, led by Horace He, released a paper titled 'Defeating Nondeterminism in LLM Inference' (see: https://lnkd.in/e-K7nywB). It tackles the problem of “reproducibility” in large language models, to make AI more reliable. You've probably noticed that even when you set AI models to their most predictable setting (temperature = 0), you still get different answers to the same question. Every business leader has brought up the trust problem in AI systems, whether I'm discussing variance in AI outputs with board members or legal teams: 'How can you trust a system that gives you a different answer each time you ask it the exact same question?' Thankfully Mira's team has solved this. 💡 Their solution: new batch-invariant kernels. They found that the root cause isn’t randomness, it's that the GPU kernels handling inference aren't consistent across different batch sizes, so your results can actually change depending on the load from other users on the same server. Mira has released their solution as open-source code - true to her promise of “science is better when shared”. 💎 The result? Same input, same output, every time. Great news for compliance teams in every business, particularly in healthcare, finance, and law where reproducibility is essential. But Mira acknowledges that even if those answers were consistent, they could still be wrong. I admire her honesty. That means: Today: same enquiry = five different responses, one may (or may not) be right. Tomorrow (with determinism): same enquiry = the same response five times… but it may still be wrong. Her team tested this solution 1000 times and got identical results every run. It slows things down - taking 42 seconds versus 26 - but wouldn't you rather wait an extra 16 seconds for something you can rely on, especially if you're making executive decisions or treating patients?
To view or add a comment, sign in
-
Non-determinism in LLM inference is more than an implementation detail — it’s a core technical challenge that shapes how reliably AI can be deployed at scale. As the team at Thinking Machines Lab explains, unpredictable outputs can limit reproducibility, complicate evaluation, and slow down integration into real-world systems. At Planck AI, we approach this problem as foundational. Our deterministic RAG pipeline is designed to ensure consistent results under identical conditions, maintain a transparent evidence trail, and give developers precise control over model behavior. This engineering focus allows enterprises to build AI systems that are robust, auditable, and ready for production. https://lnkd.in/gKHbbJ_y
To view or add a comment, sign in
More from this author
Explore related topics
- Why You Need Explainability in AI Systems
- How To Scale AI In Regulated Industries
- How AI Models can Ensure Trustworthiness and Transparency
- Ensuring Security In AI Deployments
- Building Trust and Accountability in AI Systems
- Building Trust In Machine Learning Models With Transparency
- The Role of Explainability in AI Recommendations
- How to Build AI Assurance for Product Trustworthiness
- AI Explainability for High-Risk Industries
- Best Practices for AI Safety and Trust in Language Models
The ability to rationalise the output of a LLM with certainty is where the challenge lies. As long as the underlying context remains the same, I don't see why determinism is important for language models. Only exception i see is when their use case is more complex operations than text generation. Most companies would welcome GenAI implementations with open arms if we achieve that while minimising/deflecting the risks.