Shoval Messica

Beyond Text: Advancing Speech Tokenization for Generative Spoken Language Modeling
shoval mesica

Abstract

Speech and audio modeling is a challenging task due to its rich, multi-layered nature. Unlike text, speech carries non-linguistic information such as speaker identity, emotion, and intonation, making representation learning more complex. Additionally, speech representations are continuous, with no predefined lexicon or boundaries, making modeling and sampling inherently difficult.

In this talk, I will introduce recent breakthroughs in generative spoken language modeling, where discrete representations are learned through unsupervised neural networks, entirely independent of text (textless NLP). This approach enables the adaptation of powerful NLP techniques for spoken language, capturing long-term dependencies and high-level semantics directly from raw audio. I will present my research on robust speech tokenization for spoken language modeling, emphasizing its ability to enhance speech understanding.

Bio

Shoval is a researcher in speech and audio processing, specializing in deep learning, NLP, and generative spoken language modeling. She holds a BSc in computer science from Tel Aviv University and recently completed her MSc in computer science at the Hebrew University of Jerusalem under the supervision of Dr. Yossi Adi, where she focused on improving speech tokenization and representation learning for textless NLP and spoken language modeling. Currently, she works at Mentee Robotics where she develops and trains reinforcement learning policies for humanoid robots.

Agenda

8:45 Reception
9:30 Opening remarks by WiDS TLV ambassadors
9:45 Dr. Mor Geva , Tel Aviv University: “MRI for Large Language Models: Mechanistic Interpretability from Neurons to Attention Heads”
10:15 Panel: “Pioneering Progress: a strategic look at the GenAI revolution and the new role of data scientists“
Shani Gershtein, Melingo
Mirit Elyada Bar, Intuit
Dr. Asi Messica, Lightricks
Moderated by Nitzan Gado, Intuit
10:45 Poster pitches
10:55 Break
11:10 Lightning talks session
12:30 Lunch & poster session
13:30 Roundtable session & poster session
14:30 Roundtable closing
14:40 Shunit Agmon, Technion: “Bridging the Gender Gap in Clinical AI: Temporal Adaptation with TeDi-BERT”
15:00 Shaked Naor Hoffmann, Apartment List: “Building Generative AI Agents for Production: Turning Ideas into Real-World Applications”
15:20 Closing remarks
15:30 The end