I am supervised by Francesc Moreno-Noguer, Philippe Weinzaepfel and Grégory Rogez. I am working on natural language texts and 3D human poses: my goal is to leverage text to improve human pose, shape, motion estimation and generation.
Papers
PoseFix: Correcting 3D Human Poses with Natural Language
We introduce the PoseFix dataset, which consists in over 6k triplets of 3D human pose pairs and a text modifier describing how the source pose needs to be modified to obtain the target pose. We further train a text-based pose editing model to generate corrected 3D body poses given a query pose and a text modifier; and a correctional text generation model, where correctional instructions are generated based on the differences between two body poses.
PoseScript: 3D Human Poses from Natural Language
We collect a dataset, PoseScript, pairing 3D human poses from AMASS and descriptions both written by human annotators and generated automatically by our proposed pipeline. We use PoseScript to train text-to-pose models, both for retrieval and generation. Pretraining on automatic data boost performance by a factor 2.
ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity
We take inspiration from image retrieval and cross-modal retrieval to tackle the task of composed image retrieval: we design two complementary modules, each focusing on one modality of the query. The Explicit Matching module assesses how potential targets fit the textual modifier while the Implicit Similarity module compares potential target images to the reference image, assisted by the text. We validate our approach on FashionIQ, Shoes and CIRR.
Talks
Biography