AI Radar: NVIDIA Nemotron 3 — Open Down to the Training Data
Models that publish their weights are no longer rare. With Nemotron 3, NVIDIA goes a step further and opens up the entire recipe — and for self-hosting, that is the real difference.
What Sets Nemotron 3 Apart
- Three sizes, one approach: Nano (31.6B parameters), Super (120B), and Ultra (550B, released in June) cover everything from an edge server to a multi-GPU node.
- Hybrid architecture: a Mamba-Transformer MoE with up to a million-token context — tuned for efficiency rather than raw size.
- Genuinely open: NVIDIA releases not only the weights but also the training data, the reinforcement-learning environments, the post-training recipes, and the fine-tuning code. That makes the models auditable and deliberately adaptable.
- Multimodal: the Omni variants handle text, image, and audio in a single model.
Why It Interests Us
We already run the 120B Super tier in production on our own hardware — it hits the sweet spot between capability and a defensible GPU budget. The fact that NVIDIA ships the training data is not a marketing detail: anyone who wants to adapt a model to their own domain needs exactly these building blocks.
The Caveat
The 550B Ultra model is not something for a single card — it needs a serious multi-GPU node. And "open" does not automatically mean "unlimited commercial use": the license terms belong on the reading list before any production deployment.
Our Take
Nemotron 3 is, for us, the most interesting open US model right now — less because of individual benchmark points, more because of its traceability. A model whose origin you know is a model you can take responsibility for. In the SME world, that is the whole point.