Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models

Seoul National University1, Naver Webtoon AI2
*Indicates Equal Contribution
ECCV 2024
Random Image

Given a 3D object mesh, we generates numerous 3D Human-Object Interaction samples, and learn a novel affordance representation called Comprehensive Affordance (ComA) which models both contact and non-contact patterns.

Abstract

Understanding the inherent human knowledge in interacting with a given environment (e.g., affordance) is essential for improving AI to better assist humans. While existing approaches primarily focus on human-object contacts during interactions, such affordance representation cannot fully address other important aspects of human-object interactions (HOIs), i.e. patterns of relative positions and orientations. In this paper, we introduce a novel affordance representation, named Comprehensive Affordance (ComA). Given a 3D object mesh, ComA models the distribution of relative orientation and proximity of vertices in interacting human meshes, capturing plausible patterns of contact, relative orientations, and spatial relationships. To construct the distribution, we present a novel pipeline that synthesizes diverse and realistic 3D HOI samples given any 3D target object mesh. The pipeline leverages a pre-trained 2D inpainting diffusion model to generate HOI images from object renderings and lifts them into 3D. To avoid the generation of false affordances, we propose a new inpainting framework, Adaptive Mask Inpainting. Since ComA is built on synthetic samples, it can extend to any object in an unbounded manner. Through extensive experiments, we demonstrate that ComA outperforms competitors that rely on human annotations in modeling contact-based affordance. Importantly, we also showcase the potential of ComA to reconstruct human-object interactions in 3D through an optimization framework, highlighting its advantage in incorporating both contact and non-contact properties

Method

Random Image

Our method consists of two parts; (1) Generating 3D HOI samples, represented in upper box, (2) Learning Comprehensive Affordance from Generated 3D HOI Samples, represented in lower box. We introduce Adaptive Mask Inpainting and Depth Optimization via Weak Auxiliary Cue for generating diverse and precise 3D HOI samples. From the generated samples, we learn the pointwise distribution of relative orientation and proximity, which can be derived into various forms of 3D affordance including contact, orientational tendency, and spatial relation.

Video

Results

Generated 3D Affordance Samples



Contact-based Affordance


Motorcycle

Keyboard

Skateboard


Soccer Ball

Suitcase

Tennis Racket



Orientational Affordance


Stool

Chair



Spatial Affordance

Input

Full Body

Hand

Face

BibTeX

Coming Soon!