Hi! I am a 5th-year PhD student in the Machine Learning Program of Georgia Institute of Technology, advised by Prof. James Rehg and co-advised by Prof. Zsolt Kira. Currently, I'm also a visiting student at CS department of UIUC. Prior to my PhD, I got my Master's degree majoring in ECE and Bachelor's degree majoring in Information Engineering from Shanghai Jiao Tong University. I worked with Prof. Ya Zhang during my master.
I interned at Meta GenAI (now Meta Superintelligence Lab) for three times working on research of generative models. My projects span multimodal LLMs, image/video diffusion models, autoregressive architectures and fundamental research on autoencoders. Please check my employment experiences and publications below for details.
I'm looking for a full-time Research Scientist / Applied Scientist / ML Engineer position (available starting Dec. 2025). Please drop me an email if you think I'm a good fit in your team.
Research Interests
My research interests lie in Multimodal Learning, especially Generative Models (including Multimodal LLMs and Diffusion Models) and Video Understanding.
My career goal is to build unified multimodal systems that can understand, reason, and generate across text, image, video, and audio -- by integrating LLM planning/reasoning and high-fidelity diffusion backends into one autoregressive architecture.
I'm always looking for self-motivated graduate/ungraduate students to collaborate with. Don't hesitate to reach out to me if you are interested in my research.
News
Scroll for more news ↓
Aug 2025: 👨💻 I'm on the job market now! Seeking a position starting in Dec. 2025 or Jan. 2026.
Jun 2025: 🏅 Our LEGO paper was recognized as Distinguished Paper by EgoVis at CVPR2025.
May 2025: I started my internship at Meta AGI Foundations in Seattle working with Dr. Ishan Misra.
Mar 2025: 🎉 I successfully passed my thesis proposal. Thank all committee members (James Rehg, Zsolt Kira, James Hays, Judy Hoffman)! I'll be on the job market in September of 2025. 👨💻
Feb 2025: 🎉 I have one first-author paper (InstaManip) and two co-author papers (VideoMindPalace and SocialGesture) accepted by CVPR 2025. Thank all collaborators. See you @Nashville.
Oct 2024: 🔍 We released a thorough survey in action anticipation. Please check out if you are interested in this field.
Oct 2024: 🏅 Our LEGO paper was nominated in the Best Paper Finalist @ECCV2024. Congratulations to all co-authors!
Aug 2024: 🎤 Our LEGO paper got Oral presentation.
July 2024: 🎉 Two first-author papers were accepted by ECCV! Please check out our latest work: LEGO (for action generation) and CSTS (for gaze forecasting). Thank all the co-authors!
May 2024: 👨💻👨💻 I started my second intenrship at GenAI, Meta in Bay Area.
Mar 2024: 🎉 One co-author paper was accepted by CVPR (Oral). See you in Seattle!
Jul 2023: 🎉 Our expansion of prior work GLC was accepted by IJCV!
May 2023: 👨💻 I started my internship at GenAI Meta in Bay Area!
Apr 2023: 🎉 I successfully passed the qualifying exam.
Mar 2023: 🎉 One paper was accepted to the Findings of ACL2023. Please check out our new dataset for social understanding: Werewolf Among Us.
Nov 2022: 🏅 We won the Best Student Paper Prize on BMVC. Thanks to all co-authors!
Sep 2022: 🎉 Our work GLC was accepted by BMVC 2022!
Jan 2022: I started working with Prof. James Rehg at Georgia Tech.
Employment
[May 2025 – Present] Research Scientist Intern Meta Superintelligence Lab, Multimedia Core Video Generation Team
Analyzing and improving the diffusibility of high-dimension latent space for image/video generation (In Progress) [Role: project leader, first author]