Moran Beladev

Ben Gurion University
Speed Without Sacrifice: Fine-Tuning Language Models with Medusa and Knowledge Distillation
Moran Beladev

Abstract

In high-stakes industrial NLP applications, balancing generation quality with speed and efficiency presents significant challenges. Booking.com addresses them by investigating two complementary optimization approaches: Medusa for speculative decoding and knowledge distillation (KD) for model compression. We demonstrate the practical application of these techniques in real-world travel domain tasks, including trip planning, smart filters, and generating accommodation descriptions. We introduce modifications to the Medusa implementation, starting with base pre-trained models rather than conversational fine-tuned ones, and developing a simplified single-stage training process for Medusa-2 that maintains performance while reducing computational requirements. Lastly, we present a novel framework that combines Medusa with KD, achieving compounded benefits in both model size and inference speed, making it feasible to deploy the model in production for ultra low-latency scenarios. Our experiments with TinyLlama-1.1B as the student model and Llama-3.1-70B as the teacher show that the combined approach maintains the teacher’s performance quality while reducing inference latency by 10-20x.

Bio

Moran Beladev is a Senior ML Manager at Booking.com. She is leading the content intelligence track which is focused on building, training and deploying content models (computer vision, NLP and generative AI) using the most advanced technologies and models. Moran is also a PhD candidate, researching applying NLP models on social graphs.

Agenda

08:45

Reception & gathering

09:30

Opening remarks by WiDS TLV ambassadors

09:45

Keynote session: Prof. Michal Rosen Zvi

10:15

Keynote session: Hadas Grossmon Ella

10:45

Poster pitches

10:55

Break

11:10

Lightning talks session

12:45

Lunch & poster session

13:30

Roundtable session & poster session

14:20

Roundtable closing

14:30

Talk by Hila Paz

14:50

Talk by Dr. Moran Mizrahi

15:15

Closing remarks

15:30

End