WorldGen: Text or Image to Explorable 3D Scene in Seconds

By Prahlad Menon 4 min read

Most text-to-3D tools give you an object. WorldGen gives you a scene — and then lets you walk around inside it.

The distinction matters. Generating a 3D chair is a solved problem. Generating a navigable “neon-lit cyberpunk street with rain reflections” that you can fly through from any angle, with consistent geometry across the full 360°, is considerably harder. WorldGen by Ziyang Xie (with model weights on HuggingFace) does exactly that.

Try the demos at worldgen.github.io — medieval castle at sunset, Mars terrain, Minecraft-style room, underwater coral city. These are navigable 3D scenes, not renders.

What It Generates

WorldGen supports two output modes:

Gaussian Splat (default) — the scene is represented as a 3D Gaussian Splatting model, a neural radiance field technique that captures photorealistic appearance and can be rendered in real-time from any viewpoint. Fast to generate, excellent visual quality, navigable in real-time.

Mesh mode — generates actual polygon geometry, useful for applications that need explicit surfaces (game engines, physics simulations, export to standard 3D formats). The README notes mesh mode “should give better results than splat” for structural clarity.

Both modes support 360° free exploration with loop closure — meaning the scene is geometrically consistent when you navigate all the way around and come back to the starting point. Without loop closure, panoramic scenes develop seams and inconsistencies at the boundary where the camera returns to origin.

The Pipeline

WorldGen’s generation uses FLUX.1-dev (Black Forest Labs’ open-weight diffusion model) to generate the image content, then lifts it into 3D. The image-to-scene path accepts any photograph or rendered image as input — a painting, a street photo, a concept art piece — and constructs a navigable 3D space around it.

# Two lines of code
worldgen = WorldGen()
worldgen.generate_world("A medieval castle on a hill during sunset")

The demo script visualizes the result in a web browser via Viser, a 3D visualization tool from the NeRF Studio team.

Hardware Requirements

This is not a browser tool. WorldGen requires:

  • CUDA-capable GPU
  • ~10GB VRAM minimum (a lower-VRAM mode added in May 2025)
  • PyTorch with CUDA 12.8 support
  • pytorch3d (Facebook Research’s 3D ML library)
  • FLUX.1-dev model weights from HuggingFace (gated — requires accepting the license)

The setup is more involved than a pip install:

git clone --recursive https://github.com/ZiYang-xie/WorldGen.git
conda create -n worldgen python=3.11
conda activate worldgen
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu128
pip install .
pip install git+https://github.com/facebookresearch/pytorch3d.git --no-build-isolation
# Accept FLUX.1-dev license on HuggingFace, then:
huggingface-cli login

Not a casual install, but straightforward for anyone running a local AI/ML setup. The 10GB VRAM floor puts it within reach of RTX 3080/4070 class hardware.

Use Cases

The repo lists games, simulations, robotics, and virtual reality. The practical applications are real:

Game development — rapid prototyping of environments. Generate 10 scene variants of “a broken-down factory with rusty pipes and graffiti” in the time it takes to hand-model one. Use as a reference or a starting point for art assets.

Simulation and synthetic data — robotics and autonomous systems research needs diverse 3D environments at scale. WorldGen generates photorealistic scenes in seconds. The mesh output mode is particularly relevant here since simulation engines need explicit geometry.

VR/AR content — explorable 3D scenes that can be exported and viewed in headsets.

Concept visualization — architects, game designers, and filmmakers describing environments they want to explore before committing to production.

The ml-sharp Experimental Feature

The January 2026 update added support for ml-sharp integration (modified for 360° images) for improved Gaussian Splatting quality. It’s marked experimental but available for testing:

pip install -e submodules/ml-sharp
python demo.py -p "your prompt" --use_sharp

The March 2026 quality improvement also fixed a project scale issue that was affecting GS geometry coherence.

Context: Where This Fits

WorldGen is in a line of work connecting text-to-image (solved, commoditized), image-to-3D-object (increasingly solved), and text-to-3D-scene (still hard, getting there). The 360° consistency problem — making scenes that don’t fall apart when you explore them fully — has been the main technical barrier. WorldGen’s loop closure approach addresses this directly.

It’s worth comparing to what LeCun’s world models work is trying to do at a higher level — building persistent internal representations of 3D space. WorldGen is a practical, deployable instantiation of that direction today.

MIT license on the codebase. FLUX.1-dev has its own license (Apache 2.0 with use restrictions). GitHub → | Demo → | Model weights →