A Theory of Language Models

演讲人: Sadhika Malladi Princeton University
时间: 2023-11-30 16:00-2023-11-30 17:00
地点:FIT 1-222

Large language models (LLMs) have enjoyed unprecedented success in solving complex language understanding tasks. The standard paradigm is to perform large-scale generic pre-training (e.g., learning to predict the next word) followed by supervised fine-tuning with relatively few examples. This two-phase training is poorly understood theoretically. Some natural questions are: How does pre-training help models solve downstream tasks? How can we understand the optimization procedure in fine-tuning? In this talk, I will discuss several works from my PhD that address these questions. First, we show that autoregressive pre-training induces a meaningful representation for solving downstream tasks, even without any fine-tuning. Then, we show that fine-tuning with a suitable prompt can often be understood through the lens of the neural tangent kernel. Finally, we use these insights to design a highly memory-efficient fine-tuning algorithm that adapts zeroth order methods to optimize LLMs with the same memory as inference. Our algorithm preserves performance within 1% absolute of fine-tuning on most tasks while using up to 12x less memory and 2x fewer GPU-hours.


Sadhika Malladi is a PhD candidate at Princeton University advised by Sanjeev Arora. Her work builds a rigorous understanding of how language models work, with a focus on optimization. Previously, she earned her undergraduate degree in Math with Computer Science from MIT. She has worked at OpenAI on language modeling and at Cerebras Systems on specialized hardware for machine learning.