Hi! I am a 5th-year PhD student in the Machine Learning Program of Georgia Institute of Technology, advised by Prof. James Rehg and co-advised by Prof. Zsolt Kira. Currently, I'm also a visiting student at CS department of UIUC. Prior to my PhD, I got my Master's degree majoring in ECE and Bachelor's degree majoring in Information Engineering from Shanghai Jiao Tong University. I worked with Prof. Ya Zhang during my master.
I interned at Meta GenAI (now Meta Superintelligence Labs) for three times working on research of generative models. My projects span multimodal LLMs, image/video diffusion models, autoregressive architectures and fundamental research on tokenizers. Please check my employment experiences and publications below for details.
I'm looking for a full-time Research Scientist / Applied Scientist / ML Engineer position (available starting Dec. 2025). Please drop me an email if you think I'm a good fit in your team.
Research Interests
My research interests lie in Multimodal Learning, including Multimodal Understanding (e.g., VLMs, MLLMs) and Image/Video Generation (e.g., diffusion, flow matching).
My career goal is to build omni multimodal systems that can understand, reason, and generate across text, image, video, and audio -- by integrating LLM planning/reasoning agents and high-fidelity diffusion backends into one autoregressive architecture.
I'm always looking for self-motivated graduate/ungraduate students to collaborate with. Don't hesitate to reach out to me if you are interested in my research.
News
Scroll for more news ↓
Sep 2025: 🎉 One paper got accepted by NeurIPS2025.
Aug 2025: 👨💻 I'm on the job market now! Seeking a position starting in Dec. 2025 or Jan. 2026.
Jun 2025: 🏅 Our LEGO paper was recognized as Distinguished Paper by EgoVis at CVPR2025.
May 2025: I started my internship at Meta AGI Foundations in Seattle working with Dr. Ishan Misra.
Mar 2025: 🎉 I successfully passed my thesis proposal. Thank all committee members (James Rehg, Zsolt Kira, James Hays, Judy Hoffman)! I'll be on the job market in September of 2025. 👨💻
Feb 2025: 🎉 I have one first-author paper (InstaManip) and two co-author papers (VideoMindPalace and SocialGesture) accepted by CVPR 2025. Thank all collaborators. See you @Nashville.
Oct 2024: 🔍 We released a thorough survey in action anticipation. Please check out if you are interested in this field.
Oct 2024: 🏅 Our LEGO paper was nominated in the Best Paper Finalist @ECCV2024. Congratulations to all co-authors!
Aug 2024: 🎤 Our LEGO paper got Oral presentation.
July 2024: 🎉 Two first-author papers were accepted by ECCV! Please check out our latest work: LEGO (for action generation) and CSTS (for gaze forecasting). Thank all the co-authors!
May 2024: 👨💻👨💻 I started my second intenrship at GenAI, Meta in Bay Area.
Mar 2024: 🎉 One co-author paper was accepted by CVPR (Oral). See you in Seattle!
Jul 2023: 🎉 Our expansion of prior work GLC was accepted by IJCV!
May 2023: 👨💻 I started my internship at GenAI Meta in Bay Area!
Apr 2023: 🎉 I successfully passed the qualifying exam.
Mar 2023: 🎉 One paper was accepted to the Findings of ACL2023. Please check out our new dataset for social understanding: Werewolf Among Us.
Nov 2022: 🏅 We won the Best Student Paper Prize on BMVC. Thanks to all co-authors!
Sep 2022: 🎉 Our work GLC was accepted by BMVC 2022!
Jan 2022: I started working with Prof. James Rehg at Georgia Tech.
Employment
[May 2025 – Present] Research Scientist Intern Meta Superintelligence Labs, Multimedia Core Video Generation Team
Analyzing and improving the diffusibility of high-dimension latent space for image/video generation (In Progress).[Role: project leader, first author]
Engineering experience on large-scale MovieGen codebase and distributed training.