What computations underlie the behavior of large language models (LLMs) like GPT-4 and Gemini? This talk will introduce mechanistic interpretability and discuss recent advances in mapping functional components within a model. We will begin with the challenge of automatically describing simple units in LLMs, such as neurons. Current interpretability pipelines explain a neuron primarily by analyzing the inputs that activate it while overlooking its downstream effect on model outputs. We will show how augmenting these pipelines with output-based descriptions leads to substantially more faithful and useful interpretations. Next, we will move beyond neurons to examine attention heads — one of the building blocks of modern LLMs. We will present MAPS, an efficient method for inferring the functionality of attention heads directly from their parameters. MAPS enables comprehensive mapping of predefined operations across attention heads, revealing both cross-model universality patterns and architecture-specific biases. Lastly, we will show how LLMs can automatically interpret MAPS’ estimations, identifying the salient operations of most attention heads within a model.
Mor Geva is an assistant professor (senior lecturer) at the School of Computer Science and AI at Tel Aviv University and a research scientist at Google. Her research focuses on understanding the inner workings of large language models to increase their transparency and efficiency, control their operation, and improve their reasoning abilities. Mor completed a Ph.D. in computer science at Tel Aviv University and was a postdoctoral researcher at Google DeepMind and the Allen Institute for AI. She was nominated as an MIT Rising Star in EECS (2021) and received multiple awards, including Intel’s Rising Star Faculty Award (2024), an EMNLP Best Paper Award (2024), an EACL Outstanding Paper Award (2023), and the Dan David Prize for Graduate Students in the field of AI (2020).
8:45 | Reception |
---|---|
9:30 | Opening remarks by WiDS TLV ambassadors |
9:45 | Dr. Mor Geva , Tel Aviv University: “MRI for Large Language Models: Mechanistic Interpretability from Neurons to Attention Heads” |
10:15 |
Panel: “Pioneering Progress: a strategic look at the GenAI revolution and the new role of data scientists“ Shani Gershtein, Melingo Mirit Elyada Bar, Intuit Dr. Asi Messica, Lightricks Moderated by Nitzan Gado, Intuit |
10:45 | Poster pitches |
10:55 | Break |
11:10 | Lightning talks session |
12:30 | Lunch & poster session |
---|---|
13:30 | Roundtable session & poster session |
14:30 | Roundtable closing |
14:40 | Shunit Agmon, Technion: “Bridging the Gender Gap in Clinical AI: Temporal Adaptation with TeDi-BERT” |
15:00 | Shaked Naor Hoffmann, Apartment List: “Building Generative AI Agents for Production: Turning Ideas into Real-World Applications” |
15:20 | Closing remarks |
15:30 | The end |
WiDS Tel Aviv is an independent event that is organized by Intuit’s WiDS TLV ambassadors as part of the annual WiDS Worldwide conference, the WiDS Datathon, and an estimated 200 WiDS Regional Events worldwide. Everyone is invited to attend all WiDS conference and WiDS Datathon Workshop events which feature outstanding women doing outstanding work.
© 2018-2024 WiDS TLV – Intuit. All rights reserved.
Scotty – By Nir Azoulay
Design: Sharon Geva