If you want to know where AI is and what to expect when you build AI systems, you should read these 2 articles from Anthropic and Thinking Machines Lab: https://lnkd.in/dN32aad9 https://lnkd.in/dvHNsm_k Skip all the very technical stuff, focus on the key messages. PS: you can give the articles' links to ChatGPT or Claude or whatever tool you are using and give the easy prompt "I am a non technical manager. Highlight in easy terms the issues about current AI systems highlighted in these two articles".
Gilberto Pastorella’s Post
More Relevant Posts
-
✨ Imagine asking the same question one thousand times on ChatGPT… and getting one thousand identical answers. That might sound obvious, but it’s not what happens today. Even with “deterministic” settings, large language models often produce slightly different answers to the same question. The article Defeating Nondeterminism in LLM Inference explains why: tiny quirks in floating-point math, batching, and caching make outputs inconsistent. It’s a fascinating, step-by-step breakdown of why AI systems sometimes feel less predictable than we think. For practitioners, the real value is in how the article unpacks nondeterminism at every layer of inference: from floating-point non-associativity to GPU reduction ordering to cache-vs-no-cache discrepancies in attention. The proposed solution, batch-invariant kernels that enforce deterministic reduction order and cache alignment, directly tackles these issues. Their prototype in vLLM demonstrates that we don’t have to trade reproducibility for performance. The closing line stuck with me: “We reject this defeatism.” Too often, nondeterminism is accepted as the price of scale. This piece reframes reproducibility as a baseline requirement for trustworthy AI. Deterministic inference isn’t just about consistency, it’s about building AI systems we can debug, audit, and ultimately trust. A must-read for anyone working at the intersection of research and production. https://lnkd.in/gPxmFFFt #AI #MachineLearning #LLM #Reproducibility #MLOps #ArtificialIntelligence #DeepLearning
To view or add a comment, sign in
-
Ex-OpenAI CTO, Mira Murati’s secretive start-up, Thinking Machines Lab has recently announced a new project which will aim to create “AI models with reproducible responses.” The project’s objective is to understand what causes “randomness” in AI model responses. For instance, if you ask ChatGPT the same question multiple times, you’re likely to get different answers.
To view or add a comment, sign in
-
🤔 Ever asked ChatGPT the same question twice and gotten two different answers? 𝗚𝗲𝗻 𝗔𝗜 – It’s not just magic, it’s an incredible application of mathematics. The debate continues: is AI just a fad, or the next transformative shift? The answer often depends on our perspective, experiences, and exposure — especially for those not working with it day to day. But one frustration many people share: the outputs of large language models are non-deterministic. 👉 𝗘𝘃𝗲𝗻 𝘄𝗶𝘁𝗵 𝘁𝗵𝗲 𝘀𝗮𝗺𝗲 𝗽𝗿𝗼𝗺𝗽𝘁, 𝘆𝗼𝘂 𝗺𝗮𝘆 𝗻𝗼𝘁 𝗴𝗲𝘁 𝘁𝗵𝗲 𝘀𝗮𝗺𝗲 𝗮𝗻𝘀𝘄𝗲𝗿. That may soon change. Yes — that randomness might actually be fixable. 📄 A recent paper by Horace He and Thinking Machines Lab (Mira Murati’s next venture), “𝗗𝗲𝗳𝗲𝗮𝘁𝗶𝗻𝗴 𝗡𝗼𝗻𝗱𝗲𝘁𝗲𝗿𝗺𝗶𝗻𝗶𝘀𝗺 𝗶𝗻 𝗟𝗟𝗠 𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲” (𝗧𝗵𝗶𝗻𝗸𝗶𝗻𝗴 𝗠𝗮𝗰𝗵𝗶𝗻𝗲𝘀 𝗟𝗮𝗯: 𝗖𝗼𝗻𝗻𝗲𝗰𝘁𝗶𝗼𝗻𝗶𝘀𝗺, 𝗦𝗲𝗽 𝟮𝟬𝟮𝟱), digs into the root causes of this unpredictability. It explains how nondeterminism arises not only from concurrency and floating-point rounding, but also from server load and batching. Ironically, the same parallelism and concurrency that make modern systems fast and scalable also add unpredictability as a side effect. The authors propose batch-invariant kernels to ensure reproducibility — paving the way for LLM inference that is deterministic, reliable, and scientifically auditable. We may soon see these innovations integrated into mainstream LLM APIs, marking a huge step forward for trust, reproducibility, and enterprise adoption in Gen AI. 👉 Read here: https://lnkd.in/gEf7AihM #GenAI #Predictability #Reproducibility #LLM #AIResearch
To view or add a comment, sign in
-
The Subtle Randomness of AI: Why Identical Prompts Yield Different Answers Ever passed the same prompt to a LLM — with temperature = 0 — and got different results? That’s not your imagination. It’s a subtle but critical problem in modern inference pipelines: nondeterminism. Horace He and Thinking Machines Lab just published an excellent piece explaining why this happens, and also how to address this issue. The usual suspects (floating-point rounding, concurrency) only tell part of the story. The real issue lies in batch invariance: how dynamic batching can alter the numerical path of computations, making outputs depend on how requests are grouped, not on their content. Their solution? Re-engineering key kernels — RMSNorm, MatMul, Attention — to become batch-invariant, ensuring bit-for-bit reproducibility even under load. Expected result: the same input, the same output — every single time. Why it matters: - Determinism improves debuggability and safety. - It aligns training and inference behaviour. - It helps build trustworthy agentic systems and reproducible research. This is a quiet but foundational step toward reliable AI infrastructure; something that matters far more than hype. Read the original post here: https://lnkd.in/ghSEM5Bd #AI #LLM #MachineLearning #Reproducibility #AgenticAI #IdeasArtificiales
To view or add a comment, sign in
-
Nondeterminism in Large Language Models is one of the primary barriers limiting their adoption in fields where repeatability and reproducibility are critical. This research from Thinking Machines Lab (founded by former Open AI CTO), is a step forward in solving this behavior. They've proposed a groundbreaking approach that enables computations to run with exact consistency given the same inputs. The ability to achieve deterministic outputs while maintaining LLM capabilities represents a significant step toward enterprise-grade AI reliability. https://lnkd.in/eBK6faB7 #AI #MachineLearning #LLM #Research #Innovation
To view or add a comment, sign in
-
Traditional ML models are deterministic at inference time. Meaning if you give it the same data, you get the same answer. Every time. This also means when they are wrong, they are consistently wrong. GenAI/LLMs have not worked that way. The same prompt does not always produce the same answer, every time. For a long time, we thought this was a decimal point problem. Turns out not! And even better, it’s solveable. This is a great post/paper on the details. https://lnkd.in/gkexCSX8 Now non-determinism at inference doesn’t make LLMs hallucinate less but it does make them act more consistently. Which is still helpful for detection and mitigation of those issues. #AI #GenAI #ML
To view or add a comment, sign in
-
Reliability has always been the elephant in the room with LLMs. Hallucinations make it hard to trust AI in high-stakes settings, and that's a barrier we need to solve if we want AI to play a meaningful role in travel planning, visitor services or marketing. Two articles I read this week give me some real optimism that we're close to a breakthrough: OpenAI – Why Language Models Hallucinate: https://lnkd.in/gudk-wWN Thinking Machines – Defeating Nondeterminism in LLM Inference: https://lnkd.in/gz9kyhuA Both pieces dig into why AI sometimes "makes things up" and, more importantly, what's being done to make outputs more stable and trustworthy. Together, they suggest that the next six months could bring major improvements in reliability. For those of us in the travel and destination space, that's huge. Imagine AI tools you can trust to give visitors accurate information every time, or assistants that DMOs can confidently use in front of travelers without the risk of "creative" errors. If you've been experimenting with AI in your organization, how big a difference would true reliability make for you?
To view or add a comment, sign in
-
Why Doesn’t AI Always Give the Same Answer? A Plain-English Explainer for Professionals One frequently asked question by lawyers and business professionals is why tools like ChatGPT, Co-Pilot or other AI assistants sometimes give different answers to the same question. This excellent article (https://lnkd.in/giTmjV7y) by Thinking Machines Lab digs into the technical reasons why today’s AI models don’t always behave in a fully “repeatable” way, even when you do everything possible to make them consistent. In simple terms, most AI systems are designed to be a bit creative. Even if you ask exactly the same question twice, the AI might choose different words, examples, or details. This is partly because, first, computers use special “shorthand” for numbers, which can introduce tiny rounding differences and secondly, when lots of people are using the system at once, the order in which requests are processed can change the results, sometimes in ways you won’t notice, but which can matter in scientific or legal settings. But can’t you just “turn off” the randomness? In technical setups (like using the OpenAI API), there’s a setting called “temperature” which controls how creative or random the AI can be. Setting the temperature to zero tells the model to always pick the most likely answer, which usually makes it very consistent. However, in most commercial AI interfaces (including ChatGPT’s standard web and mobile versions) you cannot directly set this temperature. What you can do is write prompts that ask the AI to be as factual, precise, and consistent as possible. This approach reduces randomness, but, as the article explains, there are still subtle technical reasons why the output can occasionally change. Why does this matter? For professionals in regulated or high-stakes environments (legal, scientific, compliance, etc.), having reproducible answers can be essential. If you need to ensure that an AI system gives the same answer every time, it’s vital to be aware of these technical limitations. What’s being done about it? The article describes new engineering solutions that can help AI systems become more predictable and consistent. While this is mostly “under the hood,” it’s a step toward making AI more reliable for professional use. Takeaway: If you need maximum consistency from AI, use clear, factual prompts to reduce creative variation; but understand that 100% repeatability can’t be guaranteed due to the way these systems work. For those using AI in critical or regulated settings, it is essential to verify all outputs carefully and never rely solely on the AI’s answer for important decisions. Always apply professional judgement and experience, especially where accuracy and accountability truly matter. https://lnkd.in/giTmjV7y
To view or add a comment, sign in
-
In today’s AI landscape, determinism and explainability are no longer nice-to-haves. They’re core requirements for any organization that wants to scale AI safely. Thinking Machines’ recent $2B raise and their public work on eliminating nondeterminism in LLM inference highlight how the industry is waking up to a simple truth: if you can’t explain or predict your AI’s behavior, you can’t trust it. At Aiceberg, we took that truth to heart from day one. We built a system that doesn’t rely on generative models to judge generative models. Our classification engine is deterministic, explainable, and grounded in real, labeled examples. That means organizations don’t have to guess why an AI system acted the way it did, they can know. In enterprise environments, where safety, compliance, and operational clarity are critical, this distinction matters. Explainability supports auditability. Determinism enables enforcement. Together, they build trust.If your AI stack can’t answer the question “Why did this happen?” it’s time to rethink the foundation. Aiceberg helps enterprises scale AI with confidence because what you can’t explain, you can’t control. #AIsecurity #XAI #AIAgents #LLM #compliance #Aiceberg #trustworthyAI #deterministicAI https://lnkd.in/gKHbbJ_y
To view or add a comment, sign in
-
Mira Murati and her new company, Thinking Machines, have reportedly solved a pervasive issue in large language models (LLMs) known as nondeterminism, which causes models to give different answers to the exact same prompt even when settings are identical. Previously blamed on factors like GPU rounding errors, the true culprit identified was a lack of batch invariance, meaning the subtle mathematics within key operations like matrix multiplications and attention shifts depending on the volume of requests processed simultaneously. Thinking Machines addressed this by developing batch invariant kernels for these operations, ensuring the internal calculations occur in the same sequence regardless of batch size, leading to true reproducibility—a critical breakthrough for AI safety, scientific research, and auditing complex models. While these new kernels may slightly increase processing time, the ability to obtain identical, verifiable results every time is considered a necessary trade-off for building reliable and trustworthy AI. https://lnkd.in/g5cJFqNg Read More: https://lnkd.in/gCsd5dv8 #AISafety #Reproducibility
To view or add a comment, sign in