OpenSpace: The Self-Evolving Engine That Makes Every AI Agent Smarter (And 46% Cheaper)

By Prahlad Menon 5 min read

Every AI coding agent — Claude Code, Codex, Cursor — has the same fundamental weakness: it starts from zero every single time.

Your agent just spent 40,000 tokens figuring out how to parse a complex PDF with merged cells and fallback encodings? Great. Tomorrow it’ll burn 40,000 tokens learning the exact same lesson again. That solution isn’t going anywhere.

OpenSpace, from the University of Hong Kong’s Data Science lab (HKUDS), is a direct attack on this problem. It’s a self-evolving skill engine that plugs into any agent via MCP and gives it something agents have been missing: the ability to learn from experience and share that learning with others.

The Three Superpowers

OpenSpace adds three capabilities to any compatible agent:

Self-Evolution. Skills that monitor themselves and improve automatically. When a skill breaks (API changed, dependency updated), AUTO-FIX repairs it. When a task succeeds, AUTO-IMPROVE distills the winning pattern into a better skill version. When the agent discovers a novel workflow, AUTO-LEARN captures it for future reuse.

Collective Intelligence. One agent’s hard-won lesson becomes every agent’s advantage. Through the open-space.cloud community, evolved skills can be shared — with granular access controls for public, private, or team-only visibility. More agents using the system means richer data and faster evolution for everyone.

Token Efficiency. The compound effect of the above: agents stop repeating work. In OpenSpace’s GDPVal benchmark, token usage dropped 45.9% between cold start and warm rerun — while task quality increased by 30 percentage points.

The Numbers That Matter

The headline claims are backed by GDPVal — 220 real-world professional tasks spanning 44 occupations. These aren’t coding puzzles or chatbot benchmarks. They’re GDP-generating work: building payroll calculators from union contracts, preparing tax returns from 15 scattered PDFs, drafting legal memoranda on California privacy regulations.

OpenSpace’s two-phase evaluation design is what makes the results credible:

  • Phase 1 (Cold Start): Run 50 tasks sequentially. No prior skills exist. The agent builds its skill library from scratch as it works.
  • Phase 2 (Warm Rerun): Re-execute the same 50 tasks with the full evolved skill database from Phase 1.

The results across 6 work categories:

CategoryQuality ImprovementToken Reduction
Documents & Correspondence+3.3pp−56%
Compliance & Forms+18.5pp−51%
Media Production+5.8pp−46%
Engineering+8.7pp−43%
Spreadsheets+7.3pp−37%
Strategy & Analysis+1.0pp−32%

The bottom line: 4.2× higher income versus baseline agents using the identical backbone LLM (Qwen 3.5-Plus), earning $11,484 out of $15,764 in total task value. The improvement comes entirely from skill evolution — not from a better model.

What the Agent Actually Learned

Here’s the most interesting finding: across 50 Phase 1 tasks, OpenSpace autonomously evolved 165 skills. But most of them aren’t about domain knowledge. They’re about execution reliability:

  • 44 File Format I/O skills — PDF extraction fallbacks, Excel merged-cell handling, DOCX parsing edge cases. 32 of 44 were captured from real failures.
  • 29 Execution Recovery skills — Layered fallback chains (sandbox → shell → file-write-then-run → heredoc). 28 of 29 captured from actual crashes.
  • 26 Document Generation skills — The document-gen-fallback skill family evolved through 13 versions, becoming the most deeply iterated lineage.
  • 23 Quality Assurance skills — Post-write verification: checking Excel row counts, validating PDF page counts, proof-gating spreadsheet formulas.

The agent didn’t just learn what to do — it learned how to reliably deliver results in an imperfect environment. That’s a more useful kind of intelligence than most benchmarks measure.

How It Connects to the Broader Landscape

If you’ve been following the Skills vs MCP token efficiency debate, OpenSpace is a compelling data point. It uses MCP as its integration layer but generates Skills as its output — evolved SKILL.md files that any compatible agent can consume. It’s not either/or; it’s MCP as transport, Skills as the learned artifact.

And if you read about Ouroboros — the self-evolving agent that refused to die — OpenSpace sits at a much safer point on the autonomy spectrum. Ouroboros rewrites its own core code and identity. OpenSpace evolves skills within a structured framework. The agent gets smarter without modifying its own architecture. That’s the difference between skill-autonomous and code-autonomous — and it’s a meaningful safety boundary.

Getting Started

Setup is straightforward for any agent that supports MCP:

{
  "mcpServers": {
    "openspace": {
      "command": "openspace-mcp",
      "toolTimeout": 600,
      "env": {
        "OPENSPACE_HOST_SKILL_DIRS": "/path/to/your/agent/skills",
        "OPENSPACE_WORKSPACE": "/path/to/OpenSpace",
        "OPENSPACE_API_KEY": "sk-xxx (optional, for cloud)"
      }
    }
  }
}

Copy two bootstrap skills (delegate-task and skill-discovery) into your agent’s skills directory, and you’re done. The agent learns when and how to use OpenSpace without additional prompting.

There’s also a standalone mode — openspace --query "your task" — if you want to use it as a direct co-worker rather than an evolution engine for an existing agent.

What This Means

The agent ecosystem is converging on a clear pattern: agents need persistent, evolvable skill libraries — not just bigger context windows or better models. OpenSpace is the most rigorous implementation of this idea so far, with real economic benchmarks on real professional tasks.

The 165 skills it evolved autonomously tell a story about where agent intelligence actually lives. It’s not in the model weights. It’s in the accumulated knowledge of what breaks, what works around those breaks, and how to verify the output. That’s the kind of learning that compounds — and that OpenSpace is designed to capture, evolve, and share.

GitHub: HKUDS/OpenSpace Community: open-space.cloud License: MIT