Nemotron 3 Super + NemoClaw: NVIDIA and Ollama Just Made Local Agents Practical

By Prahlad Menon 2 min read

Two weeks after NVIDIA announced NemoClaw at GTC 2026, the stack just got meaningfully more usable. Ollama and NVIDIA have shipped a set of updates that together make running a local, private, enterprise-safe AI agent practical for the first time without significant infrastructure overhead.

Here’s what changed.


Nemotron 3 Super: The Model That Matters

Architecture: 120B total parameters, 12B active (Mixture-of-Experts) PinchBench: 85.6% — #1 among open models for OpenClaw tasks Throughput: 5x faster than previous Nemotron generation Local requirement: 96GB VRAM (DGX Spark or RTX PRO) Cloud: Free on Ollama’s cloud (nemotron-3-super:cloud)

The MoE architecture is why the throughput jump is possible: 120B parameters but only 12B active per token, so the model gets the reasoning depth of a 120B model at a fraction of the inference cost.

PinchBench is the number that matters. Generic LLM benchmarks (HumanEval, MMLU, SWE-bench) test coding ability and knowledge, not agentic performance. PinchBench specifically measures the tool-calling, multi-step planning, and task execution patterns that OpenClaw relies on. 85.6% and #1 among open models is a meaningful result — it means Nemotron 3 Super has been deliberately optimized for agent workflows, not just general capability.

To run it on Ollama’s cloud:

ollama launch openclaw --model nemotron-3-super:cloud

To run locally (96GB VRAM required):

ollama pull nemotron-3-super

NemoClaw + Native Ollama: The Setup Is Now One Command

Our previous NemoClaw post covered the initial announcement — a privacy and security wrapper for OpenClaw with OpenShell sandboxing. The friction was configuration: wiring Ollama support in manually.

That’s gone. The updated installer handles it:

curl -fsSL https://nvidia.com/nemoclaw.sh | bash

During install, select 2 for Ollama when prompted for your runtime. When prompted for a model, select nemotron-3-super:cloud. Then connect:

nemoclaw my-assistant connect

What you get:

  • OpenClaw with OpenShell sandboxing (isolated agent execution)
  • Privacy guardrails on cloud model calls
  • Nemotron 3 Super as the model — optimized for agent tasks
  • Pre-configured Ollama runtime — no manual wiring

The install adds up to a local (or cloud-backed) agent with enterprise-grade security controls that previously required significant configuration to achieve.


OpenShell: Now for Claude Code, Codex, and OpenCode Too

This is the underreported piece of the update.

OpenShell — NVIDIA’s sandboxed execution runtime — has extended beyond NemoClaw. It now works as a safety wrapper for other coding agents:

# Create a sandboxed environment from Ollama
openshell sandbox create --from ollama

# Launch any agent inside it
ollama

This means Claude Code, Codex, and OpenCode can now run inside an OpenShell sandbox — getting the same policy-based security, network controls, and execution isolation that NemoClaw has, without requiring NemoClaw itself.

The practical implication: if you’re already using Claude Code for development, you can wrap it in OpenShell to prevent it from accidentally exfiltrating credentials, hitting external endpoints it shouldn’t, or executing destructive commands outside the sandbox. Same agent, safer runtime.


The Hardware Picture

SetupModelHardware RequiredCost
Ollama cloud (free)nemotron-3-super:cloudNoneFree
Ollama cloud (Pro/Max)Any, multi-agentNoneSubscription
Local fullnemotron-3-super (full)96GB VRAMHardware
Local smallNemotron 3 Nano 4BGeForce RTXHardware

Nemotron 3 Nano 4B — if you’re on a standard GeForce RTX without 96GB VRAM, NVIDIA shipped Nano 4B as a compact option for the same agent workflows on constrained hardware. Lower capability ceiling, but runs on consumer GPUs.

DGX Spark — NVIDIA’s desktop AI supercomputer with 128GB unified memory is the natural home for Nemotron 3 Super locally. At 120B parameters, it fits comfortably within the memory envelope with room for context.


What This Means for the Agent Stack

The picture that’s emerging from GTC 2026 and these follow-on updates:

NVIDIA is positioning as the infrastructure layer for agents. Not just GPU hardware — the full stack from model (Nemotron) to runtime (OpenShell) to security framework (NemoClaw) to hardware (DGX Spark, RTX PRO). The Ollama partnership plugs the distribution gap: Ollama handles discovery and delivery; NVIDIA provides the optimized model and security layer.

The cloud fallback is free. Nemotron 3 Super on Ollama’s cloud costs nothing at the base tier. This removes the barrier that previously made local-quality agents expensive — you get the PinchBench #1 model without paying per-token to Anthropic or OpenAI.

OpenShell extending to all coding agents is the move that matters for enterprise adoption. CISOs don’t block specific agents — they block the security risk that agents represent. OpenShell gives them a sandboxed runtime answer to that concern, regardless of which agent is running inside it.


Quick Start

# Full NemoClaw setup with Ollama (recommended)
curl -fsSL https://nvidia.com/nemoclaw.sh | bash
# → Select 2 (Ollama) → Select nemotron-3-super:cloud
nemoclaw my-assistant connect

# OpenShell sandbox for existing agents (Claude Code, Codex, etc.)
openshell sandbox create --from ollama

Full documentation: docs.nvidia.com/nemoclaw


Sources: NVIDIA Blog — GTC 2026 NemoClaw · Ollama announcement email, March 19 2026 · PinchBench

Related: NVIDIA NemoClaw: Jensen Huang Says Every Company Needs an OpenClaw Strategy · Unsloth Studio: Fine-Tune 500+ LLMs Without Code