I am an AI Researcher at Alibaba TongYi Lab. I graduated with M.S. degree in Computer Science from Peking University in 2020 and B.S. degree from Wuhan University in 2017.
My research focuses on computer vision and deep learning, particularly on generative AI models for 2D and 3D contents.
We are now recruiting for Summer Internships, and positions for Research Interns (RI) are continuously open for applications. Welcome to contact me with your CV and research statement!
A fine-grained motion-controllable image animation framework introduces the motion control via 3D-aware motion representation, leverging unified perception signals.
Given a RGB video captured by a monocular camera, our method can generate editable 3D avatar that enables both text and image-guided 3D editing. A novel representation, Tetrahedron-constrained Gaussian Splatting (TetGS) is introduced to combine the structured nature of tetrahedral grids and the high-precision rendering capabilities of 3DGS.
MIMO, a generalizable model for controllable video synthesis, can Mimic anyone anywhere in complex Motions with Object interactions. It simultaneously achieves advanced scalability to arbitrary characters, generality to novel 3D motions, and applicability to interactive real-world scenes in a unified framework.
A 3D generative model trained on millions of synthetic 2D images, which doesn't rely on any pre-existing 3D or 2D assets, but capable of producing visually realistic 3D humans with diverse contents.
Given a set of RGB portrait images captured by a monocular camera, our method can learn a photorealistic representation in neural implicit fields, and transfer it to artistic ones with underlying 3D structures changed.
A 3D generative model to translate a real-world face image into its corresponding 3D avatar with only a single style example provided. The model is 3D-aware in sense and also able to do attribute editing, such as smile, age, etc directly in the 3D domain.
DCT-Net is a novel image translation architecture for few-shot portrait stylization. It enables advanced ability to high-preserving contents, strong generality to complicated real-world scenes, and high scalability to full-body translation with only head observations.
A common cartoon translator which can not only simultaneously render exaggerated anime faces and realistic cartoon scenes, but also provide flexible user controls for desired cartoon styles.
ADGAN is a novel generative model for controllable person image synthesis, which can produce realistic person images with desired human attributes (e.g., pose, head, upper clothes and pants) provided in various source inputs.
DynTypo is a novel approach for dynamic text effects transfer by using example-based texture synthesis. High-quality results with temporal smoothing and sophisticated dynamic effects can be obtained.
A general-purpose solution to interactive texture transfer problems including turning doodles into artworks, editing decorative patterns, generating texts in special effect as well as controlling effect distribution in text images, and swapping textures.