What is OpenSpace for AI agents?

OpenSpace is a self-evolving skill engine from HKUDS (University of Hong Kong) that plugs into any coding agent — Claude Code, Codex, OpenClaw, Cursor, nanobot — via MCP. It gives agents three superpowers: self-evolving skills that auto-fix and auto-improve, collective intelligence where one agent's learning benefits all connected agents, and dramatic token savings (46% reduction on real-world benchmarks).

How much does OpenSpace reduce token usage?

In the GDPVal benchmark across 50 real-world professional tasks, OpenSpace reduced token usage by 45.9% between cold start (Phase 1) and warm rerun (Phase 2), while simultaneously increasing task quality by 30 percentage points. Professional document tasks saw the largest reduction at 56% fewer tokens.

What is the GDPVal benchmark used by OpenSpace?

GDPVal is a benchmark of 220 real-world professional tasks spanning 44 occupations, sourced from OpenAI's dataset on HuggingFace. OpenSpace was evaluated on 50 of these tasks across 6 industries using ClawWork's evaluation protocol. Tasks include building payroll calculators, preparing tax returns, and drafting legal memoranda — real GDP-generating work, not toy problems.

How does OpenSpace's self-evolution work?

OpenSpace uses three automatic triggers: AUTO-FIX (when a skill breaks, it repairs itself instantly), AUTO-IMPROVE (successful patterns get distilled into better skill versions), and AUTO-LEARN (winning workflows from actual usage are captured as reusable skills). In the GDPVal benchmark, this produced 165 evolved skills autonomously — most focused on execution recovery and error handling, not domain knowledge.

How does OpenSpace compare to Ouroboros self-modifying agents?

They're on different points of the autonomy spectrum. Ouroboros rewrites its own core code and identity — it's code-autonomous. OpenSpace evolves skills within a structured framework — it's skill-autonomous. OpenSpace is the safer, production-ready approach: the agent improves its capabilities without modifying its own architecture or identity.

Does OpenSpace work with Claude Code and Codex?

Yes. OpenSpace integrates via MCP (Model Context Protocol) and works with any agent that supports SKILL.md files — Claude Code, Codex, OpenClaw, nanobot, and Cursor. Setup requires adding OpenSpace to your agent's MCP config and copying two bootstrap skills (delegate-task and skill-discovery) into your agent's skills directory.

What is collective agent intelligence in OpenSpace?

When one agent evolves a skill through OpenSpace, that improvement can be shared with all connected agents via the open-space.cloud community. Skills have access controls (public, private, or team-only). Network effects mean more agents produce richer data and faster evolution for everyone.

How much money can OpenSpace agents earn?

On the GDPVal benchmark, OpenSpace agents earned $11,484 out of $15,764 in total task value (72.8% capture rate) — 4.2× more than baseline ClawWork agents using the identical backbone LLM (Qwen 3.5-Plus). This demonstrates that the improvement comes from skill evolution, not model capability.

OpenSpace: The Self-Evolving Engine That Makes Every AI Agent Smarter (And 46% Cheaper)

By Prahlad Menon Published 2026-03-26 5 min read

Every AI coding agent — Claude Code, Codex, Cursor — has the same fundamental weakness: it starts from zero every single time.

Your agent just spent 40,000 tokens figuring out how to parse a complex PDF with merged cells and fallback encodings? Great. Tomorrow it’ll burn 40,000 tokens learning the exact same lesson again. That solution isn’t going anywhere.

OpenSpace, from the University of Hong Kong’s Data Science lab (HKUDS), is a direct attack on this problem. It’s a self-evolving skill engine that plugs into any agent via MCP and gives it something agents have been missing: the ability to learn from experience and share that learning with others.

The Three Superpowers

OpenSpace adds three capabilities to any compatible agent:

Self-Evolution. Skills that monitor themselves and improve automatically. When a skill breaks (API changed, dependency updated), AUTO-FIX repairs it. When a task succeeds, AUTO-IMPROVE distills the winning pattern into a better skill version. When the agent discovers a novel workflow, AUTO-LEARN captures it for future reuse.

Collective Intelligence. One agent’s hard-won lesson becomes every agent’s advantage. Through the open-space.cloud community, evolved skills can be shared — with granular access controls for public, private, or team-only visibility. More agents using the system means richer data and faster evolution for everyone.

Token Efficiency. The compound effect of the above: agents stop repeating work. In OpenSpace’s GDPVal benchmark, token usage dropped 45.9% between cold start and warm rerun — while task quality increased by 30 percentage points.

The Numbers That Matter

The headline claims are backed by GDPVal — 220 real-world professional tasks spanning 44 occupations. These aren’t coding puzzles or chatbot benchmarks. They’re GDP-generating work: building payroll calculators from union contracts, preparing tax returns from 15 scattered PDFs, drafting legal memoranda on California privacy regulations.

OpenSpace’s two-phase evaluation design is what makes the results credible:

Phase 1 (Cold Start): Run 50 tasks sequentially. No prior skills exist. The agent builds its skill library from scratch as it works.
Phase 2 (Warm Rerun): Re-execute the same 50 tasks with the full evolved skill database from Phase 1.

The results across 6 work categories:

Category	Quality Improvement	Token Reduction
Documents & Correspondence	+3.3pp	−56%
Compliance & Forms	+18.5pp	−51%
Media Production	+5.8pp	−46%
Engineering	+8.7pp	−43%
Spreadsheets	+7.3pp	−37%
Strategy & Analysis	+1.0pp	−32%

The bottom line: 4.2× higher income versus baseline agents using the identical backbone LLM (Qwen 3.5-Plus), earning $11,484 out of $15,764 in total task value. The improvement comes entirely from skill evolution — not from a better model.

What the Agent Actually Learned

Here’s the most interesting finding: across 50 Phase 1 tasks, OpenSpace autonomously evolved 165 skills. But most of them aren’t about domain knowledge. They’re about execution reliability:

44 File Format I/O skills — PDF extraction fallbacks, Excel merged-cell handling, DOCX parsing edge cases. 32 of 44 were captured from real failures.
29 Execution Recovery skills — Layered fallback chains (sandbox → shell → file-write-then-run → heredoc). 28 of 29 captured from actual crashes.
26 Document Generation skills — The document-gen-fallback skill family evolved through 13 versions, becoming the most deeply iterated lineage.
23 Quality Assurance skills — Post-write verification: checking Excel row counts, validating PDF page counts, proof-gating spreadsheet formulas.

The agent didn’t just learn what to do — it learned how to reliably deliver results in an imperfect environment. That’s a more useful kind of intelligence than most benchmarks measure.

How It Connects to the Broader Landscape

If you’ve been following the Skills vs MCP token efficiency debate, OpenSpace is a compelling data point. It uses MCP as its integration layer but generates Skills as its output — evolved SKILL.md files that any compatible agent can consume. It’s not either/or; it’s MCP as transport, Skills as the learned artifact.

And if you read about Ouroboros — the self-evolving agent that refused to die — OpenSpace sits at a much safer point on the autonomy spectrum. Ouroboros rewrites its own core code and identity. OpenSpace evolves skills within a structured framework. The agent gets smarter without modifying its own architecture. That’s the difference between skill-autonomous and code-autonomous — and it’s a meaningful safety boundary.

Getting Started

Setup is straightforward for any agent that supports MCP:

{
  "mcpServers": {
    "openspace": {
      "command": "openspace-mcp",
      "toolTimeout": 600,
      "env": {
        "OPENSPACE_HOST_SKILL_DIRS": "/path/to/your/agent/skills",
        "OPENSPACE_WORKSPACE": "/path/to/OpenSpace",
        "OPENSPACE_API_KEY": "sk-xxx (optional, for cloud)"
      }
    }
  }
}

Copy two bootstrap skills (delegate-task and skill-discovery) into your agent’s skills directory, and you’re done. The agent learns when and how to use OpenSpace without additional prompting.

There’s also a standalone mode — openspace --query "your task" — if you want to use it as a direct co-worker rather than an evolution engine for an existing agent.

What This Means

The agent ecosystem is converging on a clear pattern: agents need persistent, evolvable skill libraries — not just bigger context windows or better models. OpenSpace is the most rigorous implementation of this idea so far, with real economic benchmarks on real professional tasks.

The 165 skills it evolved autonomously tell a story about where agent intelligence actually lives. It’s not in the model weights. It’s in the accumulated knowledge of what breaks, what works around those breaks, and how to verify the output. That’s the kind of learning that compounds — and that OpenSpace is designed to capture, evolve, and share.

GitHub: HKUDS/OpenSpace Community: open-space.cloud License: MIT