I am a Ph.D. student at Nanjing University supervised by Prof. Xun Cao. I am currently a visiting Ph.D. student at the CGL Lab, ETH Zurich, supervised by Prof. Barbara Solenthaler and Dr. Derek Bradley. I have also worked closely with Prof. Feng Xu at Tsinghua University. I received my Bachelor of Science degree in Electronic Science and Engineering from Nanjing University in 2019.
We propose to jointly learn the visual appearance and depth simultaneously in a diffusion-based portrait image generator. Once trained, our framework can be efficiently adapted to various downstream applications, such as facial depth-to-image and image-to-depth generation, portrait relighting, and audio-driven talking head animation with consistent 3D output.
AvatarBooth is a text-to-3D model. It creates an animatable 3D model with your word description. Also, it can generate customized model with 4~6 photos from your phone or a character design generated from diffusion model. You can play with any magic words to change your final character result with fixed identity.
Given a single portrait image, we can synthesize emotional talking faces, where mouth movements match the input audio and facial emotion dynamics follow the emotion source video.
Given an audio clip and a target video, our Emotional Video Portraits (EVP) approach is capable of generating emotion-controllable talking portraits and change the emotion of them smoothly by interpolating at the latent space.
Academic Service
Reviewer for CVPR, SIGGRAPH, ICCV, ECCV, 3DV, EG, TPAMI etc.