Bolin Lai

Hi! I am a 4th-year PhD student in the Machine Learning Program of Georgia Institute of Technology, advised by Prof. James Rehg and co-advised by Prof. Zsolt Kira. Currently, I'm also a visiting student at CS department of UIUC. Prior to starting my PhD, I got my Master's degree majoring in ECE and Bachelor's degree majoring in Information Engineering from Shanghai Jiao Tong University. I worked with Prof. Ya Zhang during my master.

I was a research scientist intern at GenAI, Meta in 2023 (Llama Image/Video Data Team led by Guan Pang) and 2024 (Llama Applied Multi-modal Team led by Ning Zhang). I closely worked with Miao Liu, Tong Xiao, Xiaoliang Dai and Lawrence Chen on generative model projects.

Email / Google Scholar / Github / LinkedIn / Twitter

Research Interests

My research interests lie in Multimodal Learning, Generative Models (including Multimodal LLMs and Diffusion Models) and Video Understanding. Currently, I'm focusing on advancing multimodal unerstanding and generation through the integration of architectures/features of Large Language Models (LLMs) and Diffusion Models (DMs), aiming to connect and leverage the latent representation spaces of the two model architectures.

I'm looking for self-motivated graduate/ungraduate students to collaborate with. Don't hesitate to reach out to me if you are interested.

News

Jun 2025: I was recognized as an outstanding reviewer in CVPR.
Jun 2025: 🏅 Our LEGO paper was recognized as Distinguished Paper by EgoVis at CVPR2025.
May 2025: I started my internship at Meta AGI Foundations in Seattle working with Dr. Ishan Misra.
Mar 2025: 🎉 I successfully passed my thesis proposal. Thank all committee members (James Rehg, Zsolt Kira, James Hays, Judy Hoffman)! I'll be on the job market in September of 2025. 👨‍💻‍
Feb 2025: 🎉 I have one first-author paper (InstaManip) and two co-author papers (VideoMindPalace and SocialGesture) accepted by CVPR 2025. Thank all collaborators. See you @Nashville.
Oct 2024: 🔍 We released a thorough survey in action anticipation. Please check out if you are interested in this field.
Oct 2024: 🏅 Our LEGO paper was nominated in the Best Paper Finalist @ECCV2024. Congratulations to all co-authors!
Aug 2024: 🎤 Our LEGO paper got Oral presentation.

Publications

	Incorporating Flexible Image Conditioning into Text-to-Video Diffusion Models without Training Bolin Lai, Sangmin Lee, Xu Cao, Xiang Li, James M. Rehg, Under Review, 2025 Webpage / Paper / Code
	Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation Bolin Lai, Felix Juefei-Xu, Miao Liu, Xiaoliang Dai, Nikhil Mehta, Chenguang Zhu, Zeyi Huang, James M. Rehg, Sangmin Lee, Ning Zhang, Tong Xiao CVPR, 2025 (Highlight) Webpage / Paper / Code / HuggingFace / Supplementary Video / Poster
	Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs Zeyi Huang, Yuyang Ji, Xiaofang Wang, Nikhil Mehta, Tong Xiao, Donghyun Lee, Sigmund VanValkenburgh, Shengxin Zha, Bolin Lai, Licheng Yu, Ning Zhang, Yong Jae Lee, Miao Liu CVPR, 2025 [Paper]
	SocialGesture: Delving into Multi-person Gesture Understanding Xu Cao, Pranav Virupaksha, Wenqi Jia, Bolin Lai, Fiona Ryan, Sangmin Lee, James M. Rehg CVPR, 2025 Webpage / Paper / Dataset
	Human Action Anticipation: A Survey Bolin Lai, Sam Toyer, Tushar Nagarajan, Rohit Girdhar, Shengxin Zha, James M. Rehg, Kris Kitani, Kristen Grauman, Ruta Desai, Miao Liu Preprint, 2024 [Paper]
	What is the Visual Cognition Gap between Humans and Multimodal LLMs? Xu Cao, Bolin Lai, Wenqian Ye, Yunsheng Ma, Joerg Heintz, Jintai Chen, Jianguo Cao, James M. Rehg Under Review, 2024 [Paper]
	Towards Social AI: A Survey on Understanding Social Interactions Sangmin Lee, Minzhi Li, Bolin Lai, Wenqi Jia, Fiona Ryan, Xu Cao, Ozgur Kara, Bikram Boote, Weiyan Shi, Diyi Yang, James M. Rehg Under Review of TPAMI, 2024 [Paper]
	LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning Bolin Lai, Xiaoliang Dai, Lawrence Chen, Guan Pang, James M. Rehg, Miao Liu ECCV, 2024 (Oral, Best Paper Finalist) EgoVis Distinguished Paper Award Webpage / Paper / Code / Dataset / HuggingFace / Supplementary / Video / Poster / Press: GT News
	Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation Bolin Lai, Fiona Ryan, Wenqi Jia, Miao Liu, James M. Rehg ECCV, 2024 Webpage / Paper / Code / Data Split / HuggingFace / Supplementary / Video / Poster
	Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations Sangmin Lee, Bolin Lai, Fiona Ryan, Bikram Boote, James M. Rehg CVPR, 2024 (Oral) Webpage / Paper / Code / Split & Annotations / Supplementary
	Werewolf Among Us: Multimodal Resources for Modeling Persuasion Behaviors in Social Deduction Games Bolin Lai, Hongxin Zhang, Miao Liu, Aryan Pariani, Fiona Ryan, Wenqi Jia, Shirley Anugrah Hayati, James M. Rehg, Diyi Yang ACL Findings, 2023 Webpage / Paper / Code / Dataset / Video
	In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation and Beyond Bolin Lai, Miao Liu, Fiona Ryan, James M. Rehg International Journal of Computer Vision (IJCV), 2023 Webpage / Paper / Code
	In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation Bolin Lai, Miao Liu, Fiona Ryan, James M. Rehg BMVC, 2022 (Spotlight, Best Student Paper) Webpage / Paper / Code / Data Split / Supplementary / Video / Poster

---------------- Research before my PhD, mainly about medical image analysis ----------------

	Semi-supervised Vein Segmentation of Ultrasound Images for Autonomous Venipuncture Yu Chen, Yuxuan Wang, Bolin Lai, Zijie Chen, Xu Cao, Nanyang Ye, Zhongyuan Ren, Junbo Zhao, Xiao-Yun Zhou, Peng Qi IROS, 2021 [Paper]
	Hetero-Modal Learning and Expansive Consistency Constraints for Semi-Supervised Detection from Multi-Sequence Data Bolin Lai, Yuhsuan Wu, Xiao-Yun Zhou, Peng Wang, Le Lu, Lingyun Huang, Mei Han, Jing Xiao, Heping Hu, Adam P. Harrison Machine Learning in Medical Imaging, 2021 [Paper]
	Liver Tumor Localization and Characterization from Multi-phase MR Volumes Using Key-Slice Prediction: A Physician-Inspired Approach Bolin Lai, Yuhsuan Wu, Xiaoyu Bai, Xiao-Yun Zhou, Peng Wang, Jinzheng Cai, Yuankai Huo, Lingyun Huang, Yong Xia, Jing Xiao, Le Lu, Heping Hu, Adam P. Harrison International Workshop on PRedictive Intelligence In MEdicine*, 2021 [Paper]
	Spatial Regularized Classification Network for Spinal Dislocation Diagnosis Bolin Lai, Shiqi Peng, Guangyu Yao, Ya Zhang, Xiaoyun Zhang, Yanfeng Wang, Hui Zhao Machine Learning in Medical Imaging, 2019 [Paper]

Service

Reviewer for
- Computer Vision and Pattern Recognition Conference (CVPR)
- International Conference on Computer Vision (ICCV)
- European Conference on Computer Vision (ECCV)
- Conference on Neural Information Processing Systems (NeurIPS)
- The Association for Computational Linguistics (ACL)
- Empirical Methods in Natural Language Processing (EMNLP)
- International Journal of Computer Vision (IJCV)
- Association for the Advancement of Artificial Intelligence (AAAI)
- International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)
- ACM Multimedia (ACM MM)
- Journal of Biomedical and Health Informatics (JBHI)
- IEEE Signal Processing Letters (SPL)

CVPR Outstanding Reviewer.
Taught ECE4871 as a teacher assistant at Georgia Tech in 2021 and 2022.
Taught CS7643 Deep Learning as a teacher assistant at Georgia Tech in 2024 and 2025.

This website is adapted from this source code.