Bolin Lai

Hi! I am a 5th-year PhD student in the Machine Learning Program of Georgia Institute of Technology, advised by Prof. James Rehg and co-advised by Prof. Zsolt Kira. Currently, I'm also a visiting student at CS department of UIUC. Prior to my PhD, I got my Master's degree majoring in ECE and Bachelor's degree majoring in Information Engineering from Shanghai Jiao Tong University. I worked with Prof. Ya Zhang during my master.

I interned at Meta GenAI (now Meta Superintelligence Lab) for three times working on research of generative models. My projects span multimodal LLMs, image/video diffusion models, autoregressive architectures and fundamental research on autoencoders. Please check my employment experiences and publications below for details.

Email  /  Google Scholar  /  Github  /  LinkedIn  /  Twitter

profile photo

I'm looking for a full-time Research Scientist / Applied Scientist / ML Engineer position (available starting Dec. 2025). Please drop me an email if you think I'm a good fit in your team.

Research Interests

My research interests lie in Multimodal Learning, especially Generative Models (including Multimodal LLMs and Diffusion Models) and Video Understanding. My career goal is to build unified multimodal systems that can understand, reason, and generate across text, image, video, and audio -- by integrating LLM planning/reasoning and high-fidelity diffusion backends into one autoregressive architecture.

I'm always looking for self-motivated graduate/ungraduate students to collaborate with. Don't hesitate to reach out to me if you are interested in my research.

News

Scroll for more news ↓
  • Aug 2025: 👨‍💻‍ I'm on the job market now! Seeking a position starting in Dec. 2025 or Jan. 2026.
  • Jun 2025: I was recognized as an outstanding reviewer in CVPR.
  • Jun 2025: 🏅 Our LEGO paper was recognized as Distinguished Paper by EgoVis at CVPR2025.
  • May 2025: I started my internship at Meta AGI Foundations in Seattle working with Dr. Ishan Misra.
  • Mar 2025: 🎉 I successfully passed my thesis proposal. Thank all committee members (James Rehg, Zsolt Kira, James Hays, Judy Hoffman)! I'll be on the job market in September of 2025. 👨‍💻‍
  • Feb 2025: 🎉 I have one first-author paper (InstaManip) and two co-author papers (VideoMindPalace and SocialGesture) accepted by CVPR 2025. Thank all collaborators. See you @Nashville.
  • Oct 2024: 🔍 We released a thorough survey in action anticipation. Please check out if you are interested in this field.
  • Oct 2024: 🏅 Our LEGO paper was nominated in the Best Paper Finalist @ECCV2024. Congratulations to all co-authors!
  • Aug 2024: 🎤 Our LEGO paper got Oral presentation.
  • July 2024: 🎉 Two first-author papers were accepted by ECCV! Please check out our latest work: LEGO (for action generation) and CSTS (for gaze forecasting). Thank all the co-authors!
  • May 2024: 👨‍💻‍👨‍💻‍ I started my second intenrship at GenAI, Meta in Bay Area.
  • Mar 2024: 🎉 One co-author paper was accepted by CVPR (Oral). See you in Seattle!
  • Jul 2023: 🎉 Our expansion of prior work GLC was accepted by IJCV!
  • May 2023: 👨‍💻‍ I started my internship at GenAI Meta in Bay Area!
  • Apr 2023: 🎉 I successfully passed the qualifying exam.
  • Mar 2023: 🎉 One paper was accepted to the Findings of ACL2023. Please check out our new dataset for social understanding: Werewolf Among Us.
  • Nov 2022: 🏅 We won the Best Student Paper Prize on BMVC. Thanks to all co-authors!
  • Sep 2022: 🎉 Our work GLC was accepted by BMVC 2022!
  • Jan 2022: I started working with Prof. James Rehg at Georgia Tech.

Employment

  • [May 2025 – Present] Research Scientist Intern
    Meta Superintelligence Lab, Multimedia Core Video Generation Team

  • [May 2024 – Dec. 2024] Research Scientist Intern
    Meta GenAI, Llama Applied Multimodal Team

  • [May 2023 – Dec. 2023] Research Scientist Intern
    Meta GenAI, Llama Image/Video Data Team
    • Text-guided instructional image generation using LLM and diffusion model (ECCV 2024, Oral, Best Paper Finalist) [Role: project leader, first author]
    • Mentor: Miao Liu      Manager: Guan Pang
    • Collaborators: Xiaoliang Dai, Lawrence Chen

Publications

(The Selected tab highlights publications that best represent my expertise)
Incorporating Flexible Image Conditioning into Text-to-Video Diffusion Models without Training

Bolin Lai, Sangmin Lee, Xu Cao, Xiang Li, James M. Rehg
Under Review, 2025
Webpage / Paper / Code
Learning Predictive Visuomotor Coordination

Wenqi Jia, Bolin Lai, Miao Liu, Danfei Xu, James M. Rehg
Under Review, 2025
Webpage / Paper
Towards Online Multi-Modal Social Interaction Understanding

Xinpeng Li, Shijian Deng, Bolin Lai, Weiguo Pian, James M. Rehg, Yapeng Tian
Under Review, 2025
Paper / Code
Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation

Bolin Lai, Felix Juefei-Xu, Miao Liu, Xiaoliang Dai, Nikhil Mehta, Chenguang Zhu, Zeyi Huang, James M. Rehg, Sangmin Lee, Ning Zhang, Tong Xiao
CVPR, 2025 (Highlight)
Webpage / Paper / Code / HuggingFace / Supplementary Video / Poster
Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs

Zeyi Huang, Yuyang Ji, Xiaofang Wang, Nikhil Mehta, Tong Xiao, Donghyun Lee, Sigmund VanValkenburgh, Shengxin Zha, Bolin Lai, Licheng Yu, Ning Zhang, Yong Jae Lee, Miao Liu
CVPR, 2025
[Paper]
SocialGesture: Delving into Multi-person Gesture Understanding

Xu Cao, Pranav Virupaksha, Wenqi Jia, Bolin Lai, Fiona Ryan, Sangmin Lee, James M. Rehg
CVPR, 2025
Webpage / Paper / Dataset
Human Action Anticipation: A Survey

Bolin Lai*, Sam Toyer*, Tushar Nagarajan, Rohit Girdhar, Shengxin Zha, James M. Rehg, Kris Kitani, Kristen Grauman, Ruta Desai, Miao Liu
Under Review of IJCV, 2024
[Paper]
What is the Visual Cognition Gap between Humans and Multimodal LLMs?

Xu Cao, Bolin Lai, Wenqian Ye, Yunsheng Ma, Joerg Heintz, Jintai Chen, Jianguo Cao, James M. Rehg
COLM, 2025
[Paper]
MM-SPUBENCH: Towards Better Understanding of Spurious Biases in Multimodal LLMs

Wenqian Ye, Guangtao Zheng, Yunsheng Ma, Xu Cao, Bolin Lai, James M. Rehg, Aidong Zhang
Under Review, 2024
[Paper]
Towards Social AI: A Survey on Understanding Social Interactions

Sangmin Lee, Minzhi Li, Bolin Lai, Wenqi Jia, Fiona Ryan, Xu Cao, Ozgur Kara, Bikram Boote, Weiyan Shi, Diyi Yang, James M. Rehg
Under Review of TPAMI, 2024
[Paper]
LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning

Bolin Lai, Xiaoliang Dai, Lawrence Chen, Guan Pang, James M. Rehg, Miao Liu
ECCV, 2024 (Oral, Best Paper Finalist)
EgoVis Distinguished Paper Award
Webpage / Paper / Code / Dataset / HuggingFace / Supplementary / Video / Poster / Press: GT News
Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation

Bolin Lai, Fiona Ryan, Wenqi Jia, Miao Liu*, James M. Rehg*
ECCV, 2024
Webpage / Paper / Code / Data Split / HuggingFace / Supplementary / Video / Poster
Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations

Sangmin Lee, Bolin Lai, Fiona Ryan, Bikram Boote, James M. Rehg
CVPR, 2024 (Oral)
Webpage / Paper / Code / Split & Annotations / Supplementary
Werewolf Among Us: Multimodal Resources for Modeling Persuasion Behaviors in Social Deduction Games

Bolin Lai*, Hongxin Zhang*, Miao Liu*, Aryan Pariani*, Fiona Ryan, Wenqi Jia, Shirley Anugrah Hayati, James M. Rehg, Diyi Yang
ACL Findings, 2023
Webpage / Paper / Code / Dataset / Video
In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation and Beyond

Bolin Lai, Miao Liu, Fiona Ryan, James M. Rehg
International Journal of Computer Vision (IJCV), 2023
Webpage / Paper / Code
In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation

Bolin Lai, Miao Liu, Fiona Ryan, James M. Rehg
BMVC, 2022 (Spotlight, Best Student Paper)
Webpage / Paper / Code / Data Split / Supplementary / Video / Poster
Research before my PhD, mainly about medical image analysis
Semi-supervised Vein Segmentation of Ultrasound Images for Autonomous Venipuncture

Yu Chen, Yuxuan Wang, Bolin Lai, Zijie Chen, Xu Cao, Nanyang Ye, Zhongyuan Ren, Junbo Zhao, Xiao-Yun Zhou, Peng Qi
IROS, 2021
[Paper]
Hetero-Modal Learning and Expansive Consistency Constraints for Semi-Supervised Detection from Multi-Sequence Data

Bolin Lai, Yuhsuan Wu, Xiao-Yun Zhou, Peng Wang, Le Lu, Lingyun Huang, Mei Han, Jing Xiao, Heping Hu, Adam P. Harrison
Machine Learning in Medical Imaging, 2021
[Paper]
Liver Tumor Localization and Characterization from Multi-phase MR Volumes Using Key-Slice Prediction: A Physician-Inspired Approach

Bolin Lai*, Yuhsuan Wu*, Xiaoyu Bai*, Xiao-Yun Zhou, Peng Wang, Jinzheng Cai, Yuankai Huo, Lingyun Huang, Yong Xia, Jing Xiao, Le Lu, Heping Hu, Adam P. Harrison
International Workshop on PRedictive Intelligence In MEdicine, 2021
[Paper]
Spatial Regularized Classification Network for Spinal Dislocation Diagnosis

Bolin Lai, Shiqi Peng, Guangyu Yao, Ya Zhang, Xiaoyun Zhang, Yanfeng Wang, Hui Zhao
Machine Learning in Medical Imaging, 2019
[Paper]

Awards and Service

Reviewer for:
  • Computer Vision and Pattern Recognition Conference (CVPR)
  • International Conference on Computer Vision (ICCV)
  • European Conference on Computer Vision (ECCV)
  • Conference on Neural Information Processing Systems (NeurIPS)
  • The Association for Computational Linguistics (ACL)
  • Empirical Methods in Natural Language Processing (EMNLP)
  • Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
  • International Journal of Computer Vision (IJCV)
  • Association for the Advancement of Artificial Intelligence (AAAI)
  • International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)
  • ACM Multimedia (ACM MM)
  • Journal of Biomedical and Health Informatics (JBHI)
  • IEEE Signal Processing Letters (SPL)

This website is adapted from this source code.