SimuScene

SimuScene: Simulation-Ready Compositional
3D Scene Reconstruction from a Single Image

Inhee Lee^*, Sangwon Baik^*, Sungjoo Kim, Hyeonwoo Kim, Hyunsoo Cha, Hanbyul Joo^†

^* Equal contribution · ^† Corresponding author

Seoul National University

Preprint

SimuScene reconstructs simulation-ready compositional 3D scenes from a single image, using physics simulation in the loop to correct object shape and pose.

ABSTRACT

Reconstructing interactive, simulation-ready 3D scenes from a single image is a critical bottleneck for robotic manipulation. While recent single-image lifters recover plausible per-object shapes, composing them yields scenes that collapse under physical simulation due to interpenetrating, hovering, or sinking objects. Existing physics-aware methods address this strictly as a post-hoc layout correction, leaving the underlying geometric errors unresolved. To address this, we introduce SimuScene, a compositional 3D reconstruction pipeline that puts physics in the loop of shape and layout estimation. Rather than using physics merely for layout cleanup, we utilize the physics engine as a diagnostic measurement tool during the generative process itself. By diagnostically simulating reconstructed objects under gravity, we convert penetration and support failures into quantitative correction signals that drive gravity-axis stretching and amodal shape resampling. This physics-informed feedback loop mitigates accumulated reconstruction errors and produces a stable, simulation-ready compositional 3D scene. Extensive experiments demonstrate state-of-the-art performance on physical stability and geometric alignment benchmarks. We further highlight SimuScene's utility by deploying reconstructed environments in humanoid control and robot-arm manipulation tasks.

METHOD

From a single input image, we decompose the scene into a base structure mesh and per-object initial meshes via foundation priors. The result is a physically plausible 3D scene directly usable in a physics simulator.

Examples

Each object enters a sequential per-object cycle of pose initialization, diagnostic physics simulation, and shape correction; the simulation acts as a diagnostic loop whose displacement signal drives the shape correction.

Input

Progress

Input

Progress

Input

Progress

Input

Progress

Input

Progress

Input

Progress

Input

Progress

Input

Progress

RESULTS

All scenes are visualized after gravity-driven physics simulation. Baselines collapse or stay artificially stuck mid-air, while SimuScene stays physically stable and input-aligned.

Input

SAM3D

3D-Re-Gen

3D-Re-Gen (our bg)

Gen3DSR

Gen3DSR (our bg)

Ours

Input

SAM3D

3D-Re-Gen

3D-Re-Gen (our bg)

Gen3DSR

Gen3DSR (our bg)

Ours

Input

SAM3D

3D-Re-Gen

3D-Re-Gen (our bg)

Gen3DSR

Gen3DSR (our bg)

Ours

Input

SAM3D

3D-Re-Gen

3D-Re-Gen (our bg)

Gen3DSR

Gen3DSR (our bg)

Ours

Input

SAM3D

3D-Re-Gen

3D-Re-Gen (our bg)

Gen3DSR

Gen3DSR (our bg)

Ours

Input

SAM3D

3D-Re-Gen

3D-Re-Gen (our bg)

Gen3DSR

Gen3DSR (our bg)

Ours

Input

SAM3D

3D-Re-Gen

3D-Re-Gen (our bg)

Gen3DSR

Gen3DSR (our bg)

Ours

Input

SAM3D

3D-Re-Gen

3D-Re-Gen (our bg)

Gen3DSR

Gen3DSR (our bg)

Ours

Input

SAM3D

3D-Re-Gen

3D-Re-Gen (our bg)

Gen3DSR

Gen3DSR (our bg)

Ours

Input

SAM3D

3D-Re-Gen

3D-Re-Gen (our bg)

Gen3DSR

Gen3DSR (our bg)

Ours

Input

SAM3D

3D-Re-Gen

3D-Re-Gen (our bg)

Gen3DSR

Gen3DSR (our bg)

Ours

Input

SAM3D

3D-Re-Gen

3D-Re-Gen (our bg)

Gen3DSR

Gen3DSR (our bg)

Ours

Input

SAM3D

3D-Re-Gen

3D-Re-Gen (our bg)

Gen3DSR

Gen3DSR (our bg)

Ours

For scenes where a baseline's own walls/floor fail, we apply our plane extraction logic to recover the floor and compare every method on equal condition.

APPLICATIONS

Our object-complete, simulation-ready scenes provide assets for physics-based humanoid character control, enabling dexterous human–object interaction.

“Put apple in the basket”

Input

Ours

“Lift the kettlebell”

Input

Ours

Our simulation-ready, input-aligned reconstructions serve as a controllable test bed for text-guided cluttered robot-arm manipulation.

Scene 1

Input

Naive (SAM3D)

Ours

Scene 2

Input

Naive (SAM3D)

Ours

BIBTEX

@misc{simuscene2026,
    title         = {SimuScene: Simulation-Ready Compositional 3D Scene Reconstruction from a Single Image},
    author        = {Inhee Lee and Sangwon Baik and Sungjoo Kim and Hyeonwoo Kim and Hyunsoo Cha and Hanbyul Joo},
    year          = {2026},
    eprint        = {TODO: arXiv id},
    archivePrefix = {arXiv},
    primaryClass  = {cs.CV}
}

ACK.

TBU.