SimuScene: Simulation-Ready Compositional
3D Scene Reconstruction from a Single Image

Inhee Lee*, Sangwon Baik*, Sungjoo Kim, Hyeonwoo Kim, Hyunsoo Cha, Hanbyul Joo

* Equal contribution  ·   Corresponding author

Seoul National University

Preprint
SimuScene teaser

SimuScene reconstructs simulation-ready compositional 3D scenes from a single image, using physics simulation in the loop to correct object shape and pose.

ABSTRACT

Reconstructing interactive, simulation-ready 3D scenes from a single image is a critical bottleneck for robotic manipulation. While recent single-image lifters recover plausible per-object shapes, composing them yields scenes that collapse under physical simulation due to interpenetrating, hovering, or sinking objects. Existing physics-aware methods address this strictly as a post-hoc layout correction, leaving the underlying geometric errors unresolved. To address this, we introduce SimuScene, a compositional 3D reconstruction pipeline that puts physics in the loop of shape and layout estimation. Rather than using physics merely for layout cleanup, we utilize the physics engine as a diagnostic measurement tool during the generative process itself. By diagnostically simulating reconstructed objects under gravity, we convert penetration and support failures into quantitative correction signals that drive gravity-axis stretching and amodal shape resampling. This physics-informed feedback loop mitigates accumulated reconstruction errors and produces a stable, simulation-ready compositional 3D scene. Extensive experiments demonstrate state-of-the-art performance on physical stability and geometric alignment benchmarks. We further highlight SimuScene's utility by deploying reconstructed environments in humanoid control and robot-arm manipulation tasks.

METHOD
SimuScene pipeline overview

From a single input image, we decompose the scene into a base structure mesh and per-object initial meshes via foundation priors. The result is a physically plausible 3D scene directly usable in a physics simulator.

Examples

Each object enters a sequential per-object cycle of pose initialization, diagnostic physics simulation, and shape correction; the simulation acts as a diagnostic loop whose displacement signal drives the shape correction.

RESULTS

All scenes are visualized after gravity-driven physics simulation. Baselines collapse or stay artificially stuck mid-air, while SimuScene stays physically stable and input-aligned.

APPLICATIONS

Our object-complete, simulation-ready scenes provide assets for physics-based humanoid character control, enabling dexterous human–object interaction.

“Put apple in the basket”
Input
Input apple
Ours
“Lift the kettlebell”
Input
Input kettle
Ours

Our simulation-ready, input-aligned reconstructions serve as a controllable test bed for text-guided cluttered robot-arm manipulation.

Scene 1
Input
Input scene 1
Naive (SAM3D)
Ours
Scene 2
Input
Input scene 2
Naive (SAM3D)
Ours
BIBTEX
@misc{simuscene2026,
    title         = {SimuScene: Simulation-Ready Compositional 3D Scene Reconstruction from a Single Image},
    author        = {Inhee Lee and Sangwon Baik and Sungjoo Kim and Hyeonwoo Kim and Hyunsoo Cha and Hanbyul Joo},
    year          = {2026},
    eprint        = {TODO: arXiv id},
    archivePrefix = {arXiv},
    primaryClass  = {cs.CV}
}
ACK.

TBU.