Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models

Seoul National University1, Naver Webtoon AI2
*Indicates Equal Contribution
ECCV 2024 (Oral)
Random Image

Given a 3D object mesh, we generates numerous 3D Human-Object Interaction samples, and learn a novel affordance representation called Comprehensive Affordance (ComA) which models both contact and non-contact patterns.


Abstract

Understanding the inherent human knowledge in interacting with a given environment (e.g., affordance) is essential for improving AI to better assist humans. While existing approaches primarily focus on human-object contacts during interactions, such affordance representation cannot fully address other important aspects of human-object interactions (HOIs), i.e. patterns of relative positions and orientations. In this paper, we introduce a novel affordance representation, named Comprehensive Affordance (ComA). Given a 3D object mesh, ComA models the distribution of relative orientation and proximity of vertices in interacting human meshes, capturing plausible patterns of contact, relative orientations, and spatial relationships. To construct the distribution, we present a novel pipeline that synthesizes diverse and realistic 3D HOI samples given any 3D target object mesh. The pipeline leverages a pre-trained 2D inpainting diffusion model to generate HOI images from object renderings and lifts them into 3D. To avoid the generation of false affordances, we propose a new inpainting framework, Adaptive Mask Inpainting. Since ComA is built on synthetic samples, it can extend to any object in an unbounded manner. Through extensive experiments, we demonstrate that ComA outperforms competitors that rely on human annotations in modeling contact-based affordance. Importantly, we also showcase the potential of ComA to reconstruct human-object interactions in 3D through an optimization framework, highlighting its advantage in incorporating both contact and non-contact properties


Key Takeaways


Problem Description

Traditional affordances focus on contact in human-object interactions. However, important patterns like relative orientations and positions cannot be expressed through contact alone. These overlooked aspects are crucial for further understanding affordance.


Problem Description

Comprehensive Affordance (ComA) is the first to capture both high-resolution contact and non-contact interactions, offering a complete view of object affordances.


We present a scalable method to learn ComA for any 3D objects. In a nutshell, (1) we leverage pre-trained diffusion model to generate large-scale samples of 3D humans interacting with the given object, and (2) we use that generated dataset to learn ComA.




ComA enables diverse applications, including reconstructing human-object interactions (figure). We can adapt these applications to any 3D objects using our Dataset Generation method.

Video

Results


Contact-based Affordance


Motorcycle

Keyboard

Skateboard


Soccer Ball

Suitcase

Tennis Racket



Orientational Affordance


Stool

Chair



Spatial Affordance

Input

Full Body

Hand

Face

BibTeX

@misc{coma,
      title={Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models}, 
      author={Hyeonwoo Kim and Sookwan Han and Patrick Kwon and Hanbyul Joo},
      year={2024},
      eprint={2401.12978},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2401.12978},
}