Thinking Machines explores nondeterminism in LLM inference

This title was summarized by AI from the post below.

Imagine asking the same person the same question twice, but getting slightly different answers Thinking Machines is a team with deep roots in creating some of the most transformative AI systems we know today — ChatGPT, Character.ai, and other leading platforms. They’ve now launched a brand-new research group, and their first blog post is already tackling a question that has puzzled both researchers and practitioners: 👉 Why do large language models sometimes give different answers, even when we set temperature to zero? For many, the assumption has been simple: “temperature=0” should mean deterministic output — the same prompt, the same answer, every time. Yet in practice, that hasn’t always been true. 🔍 In their blog, the Thinking Machines team dives deep into this issue of nondeterminism in LLM inference and uncovers what’s really going on: » Floating point math is non-associative. Even tiny changes in the order of addition or multiplication can cause slightly different results. » Parallel computing on GPUs makes this worse. Operations may not always run in the same sequence, producing small but noticeable output differences. » Inference engines add complexity. Optimizations like kernel fusion, graph compilers, and distributed execution can make the math path non-reproducible. » Tie-breaking between equally likely tokens isn’t always consistent. This can lead to diverging outputs, even with identical settings. 💡 What’s powerful about this work is not just the diagnosis, but the solutions they propose. The team shows how to configure inference so that outputs really are deterministic: » Using deterministic kernels and carefully chosen compute libraries. » Controlling seeds and ensuring fixed tie-breaking behavior. » Structuring execution so floating-point order is preserved. This matters because scientific reproducibility and engineering trust depend on being able to reproduce results exactly. In research, we want to confirm findings without hidden randomness. In production, companies want users to have consistent, reliable interactions with AI systems. ✨ With this first blog, Thinking Machines has shown what their new research direction is all about: digging into hard, foundational issues in AI systems, and pushing for solutions that make large-scale models more reliable, transparent, and useful. 📖 You can read the full blog here: https://lnkd.in/e38jfiJt

To view or add a comment, sign in

Explore content categories