Large Language Models (LLMs) encode vast amounts of factual knowledge, much of which is linked to real-world entities. While previous works have explored the mechanisms behind entity representation, open questions remain about their extent and consistency. This research investigates whether LLMs maintain a shared representation for different descriptions of the same entity, such as “Barack Obama” and “the 44th president of the United States.” Specifically, we examine whether knowledge is retrieved from the same source or transferred through a shared mechanism when answering questions about an entity and its description. Understanding whether and how LLMs internally link different representations of the same entity is crucial for improving interpretability, trust, and targeted model editing.
Our research uses datasets of paired entity descriptions, ensuring that model responses remain consistent across naming variations. We employ interpretability techniques, including vocabulary projection and sparse autoencoders to examine the internal retrieval mechanisms of LLMs. By analyzing activations across transformer layers, we determine whether knowledge about an entity is retrieved from the same parameters when queried through different descriptions. Additionally, we explore whether entity attributes are encoded in a shared manner, independent of the specific phrasing used to reference them.
Our findings indicate that different descriptions of the same entity tend to share representation structures, exhibit consistent knowledge recall patterns, and influence each other in knowledge editing tasks.
These findings provide evidence that entity representations in LLMs exhibit some degree of consistency across different descriptions. The mutual influence of knowledge editing between descriptions of the same entity suggests the presence of a shared component in the model’s fact-retrieval mechanism. These insights help improve trust in LLMs by revealing how real-world entities are internally represented.
Hail is an MSc student in computer science at Bar-Ilan University, specializing in NLP and model interpretability. Her research focuses on understanding how large language models represent and retrieve knowledge. Specifically, she investigates whether there is shared knowledge or a shared knowledge retrieval mechanism across entity descriptions in LLMs.
8:45 | Reception |
---|---|
9:30 | Opening remarks by WiDS TLV ambassadors |
9:45 | Dr. Mor Geva , Tel Aviv University: “MRI for Large Language Models: Mechanistic Interpretability from Neurons to Attention Heads” |
10:15 |
Panel: “Pioneering Progress: a strategic look at the GenAI revolution and the new role of data scientists“ Shani Gershtein, Melingo Mirit Elyada Bar, Intuit Dr. Asi Messica, Lightricks Moderated by Nitzan Gado, Intuit |
10:45 | Poster pitches |
10:55 | Break |
11:10 | Lightning talks session |
12:30 | Lunch & poster session |
---|---|
13:30 | Roundtable session & poster session |
14:30 | Roundtable closing |
14:40 | Shunit Agmon, Technion: “Bridging the Gender Gap in Clinical AI: Temporal Adaptation with TeDi-BERT” |
15:00 | Shaked Naor Hoffmann, Apartment List: “Building Generative AI Agents for Production: Turning Ideas into Real-World Applications” |
15:20 | Closing remarks |
15:30 | The end |
WiDS Tel Aviv is an independent event that is organized by Intuit’s WiDS TLV ambassadors as part of the annual WiDS Worldwide conference, the WiDS Datathon, and an estimated 200 WiDS Regional Events worldwide. Everyone is invited to attend all WiDS conference and WiDS Datathon Workshop events which feature outstanding women doing outstanding work.
© 2018-2024 WiDS TLV – Intuit. All rights reserved.
Scotty – By Nir Azoulay
Design: Sharon Geva