Bolin Lai
Hi! I am a 4th-year PhD student in the Machine Learning Program of Georgia Institute of Technology, advised by Prof. James Rehg and co-advised by Prof. Zsolt Kira. Currently, I'm also a visiting student at UIUC. Prior to starting my PhD, I got my Master's degree majoring in ECE and Bachelor's degree majoring in Information Engineering from Shanghai Jiao Tong University. I worked with Prof. Ya Zhang during my master.
I was a research scientist intern at GenAI, Meta in 2023 (Llama Image/Video Data Team led by Guan Pang) and 2024 (Llama Applied Multi-modal Team led by Ning Zhang). I closely worked with Miao Liu, Tong Xiao, Xiaoliang Dai and Lawrence Chen on generative model projects.
I'm actively looking for internship opportunities in summer 2025.
Email /
Google Scholar /
Github /
LinkedIn /
Twitter
|
|
Research Interests
|
My research interests lie in Multi-modal Learning, Generative Models (including Multimodal LLMs and Diffusion Models) and Video Understanding.
Currently, I'm focusing on advancing multi-modal unerstanding and generation through the integration of Large Language Models (LLMs) and Diffusion Models (DMs), aiming to connect and leverage the latent representation spaces of these two model architectures.
I'm looking for self-motivated graduate/ungraduate students to collaborate with. Don't hesitate to reach out to me if you are interested.
|
News
|
- Oct 2024: We released a thorough survey in action anticipation. Please check out if you are interested in this field.
- Oct 2024: Our LEGO paper was nominated in the Best Paper Finalist @ECCV2024. Congratulations to all co-authors!
- Aug 2024: Our LEGO paper got Oral presentation.
- July 2024: Two first-author papers were accepted by ECCV! Please check out our latest work: LEGO (for action generation) and CSTS (for gaze forecasting). Thank all the co-authors!
- May 2024: I started my second intenrship at GenAI, Meta in Bay Area.
- Mar 2024: One co-author paper was accepted by CVPR (Oral). See you in Seattle!
- Jul 2023: Our expansion of prior work GLC was accepted by IJCV!
|
|
Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation
Bolin Lai,
Felix Juefei-Xu,
Miao Liu,
Xiaoliang Dai,
Nikhil Mehta,
Chenguang Zhu,
Zeyi Huang,
James M. Rehg,
Sangmin Lee,
Ning Zhang,
Tong Xiao
Under Review of CVPR
Webpage /
Paper /
Code /
Video
|
|
Human Action Anticipation: A Survey
Bolin Lai*,
Sam Toyer*,
Tushar Nagarajan,
Rohit Girdhar,
Shengxin Zha,
James M. Rehg,
Kris Kitani,
Kristen Grauman,
Ruta Desai,
Miao Liu
Under Review of TPAMI
[Paper]
|
|
What is the Visual Cognition Gap between Humans and Multimodal LLMs?
Xu Cao,
Bolin Lai,
Wenqian Ye,
Yunsheng Ma,
Joerg Heintz,
Jintai Chen,
Jianguo Cao,
James M. Rehg
Under Review of ICLR
[Paper]
|
|
Towards Social AI: A Survey on Understanding Social Interactions
Sangmin Lee,
Minzhi Li,
Bolin Lai,
Wenqi Jia,
Fiona Ryan,
Xu Cao,
Ozgur Kara,
Bikram Boote,
Weiyan Shi,
Diyi Yang,
James M. Rehg
Under Review of TPAMI
[Paper]
|
|
LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning
Bolin Lai,
Xiaoliang Dai,
Lawrence Chen,
Guan Pang,
James M. Rehg,
Miao Liu
ECCV, 2024 (Oral, Best Paper Finalist)
Webpage /
Paper /
Code /
Dataset /
Supplementary /
Video /
Poster /
Press: GT News
|
|
Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation
Bolin Lai,
Fiona Ryan,
Wenqi Jia,
Miao Liu*,
James M. Rehg*
ECCV, 2024
Webpage /
Paper /
Code /
Data Split /
Supplementary /
Video /
Poster
|
|
Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations
Sangmin Lee,
Bolin Lai,
Fiona Ryan,
Bikram Boote,
James M. Rehg
CVPR, 2024 (Oral) [Acceptance Rate 0.8%]
Webpage /
Paper /
Code /
Split & Annotations /
Supplementary
|
|
Werewolf Among Us: Multimodal Resources for Modeling Persuasion Behaviors in Social Deduction Games
Bolin Lai*,
Hongxin Zhang*,
Miao Liu*,
Aryan Pariani*,
Fiona Ryan,
Wenqi Jia,
Shirley Anugrah Hayati,
James M. Rehg,
Diyi Yang
ACL Findings, 2023
Webpage /
Paper /
Code /
Dataset /
Video
|
|
In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation and Beyond
Bolin Lai,
Miao Liu,
Fiona Ryan,
James M. Rehg
International Journal of Computer Vision (IJCV), 2023
Webpage /
Paper /
Code
|
|
In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation
Bolin Lai,
Miao Liu,
Fiona Ryan,
James M. Rehg
BMVC, 2022 (Spotlight, Best Student Paper)
Webpage /
Paper /
Code /
Data Split /
Supplementary /
Video /
Poster
|
---------------- Research before my PhD, mainly about medical image analysis ----------------
|
|
Semi-supervised Vein Segmentation of Ultrasound Images for Autonomous Venipuncture
Yu Chen,
Yuxuan Wang,
Bolin Lai,
Zijie Chen,
Xu Cao,
Nanyang Ye,
Zhongyuan Ren,
Junbo Zhao,
Xiao-Yun Zhou,
Peng Qi
IROS, 2021
[Paper]
|
|
Hetero-Modal Learning and Expansive Consistency Constraints for Semi-Supervised Detection from Multi-Sequence Data
Bolin Lai, Yuhsuan Wu,
Xiao-Yun Zhou,
Peng Wang,
Le Lu,
Lingyun Huang, Mei Han,
Jing Xiao,
Heping Hu,
Adam P. Harrison
Machine Learning in Medical Imaging, 2021
[Paper]
|
|
Liver Tumor Localization and Characterization from Multi-phase MR Volumes Using Key-Slice Prediction: A Physician-Inspired Approach
Bolin Lai*, Yuhsuan Wu*, Xiaoyu Bai*,
Xiao-Yun Zhou,
Peng Wang,
Jinzheng Cai,
Yuankai Huo,
Lingyun Huang,
Yong Xia,
Jing Xiao,
Le Lu,
Heping Hu,
Adam P. Harrison
International Workshop on PRedictive Intelligence In MEdicine, 2021
[Paper]
|
|
Spatial Regularized Classification Network for Spinal Dislocation Diagnosis
Bolin Lai, Shiqi Peng, Guangyu Yao,
Ya Zhang,
Xiaoyun Zhang,
Yanfeng Wang,
Hui Zhao
Machine Learning in Medical Imaging, 2019
[Paper]
|
Reviewer for
- Computer Vision and Pattern Recognition Conference (CVPR)
- European Conference on Computer Vision (ECCV)
- The Association for Computational Linguistics (ACL)
- Empirical Methods in Natural Language Processing (EMNLP)
- International Journal of Computer Vision (IJCV)
- Association for the Advancement of Artificial Intelligence (AAAI)
- International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)
- Journal of Biomedical and Health Informatics (JBHI)
- IEEE Signal Processing Letters (SPL)
Taught ECE4871 as a teacher assistant at Georgia Tech in 2021 and 2022.
Taught CS7643 Deep Learning as a teacher assistant at Georgia Tech in 2024.
|
|