Reinforcement fine-tuning can teach language models to reason — but the hardest part isn’t the model or the optimizer. It’s the reward function. A reward function looks like engineering: it compiles, it runs, it returns a number. But it behaves like a contract with an adversarial optimizer that will find every loophole you left open.
Through a real-world case study — teaching LLMs to match financial transactions across record-keeping systems — we iterated through three rounds of reward redesign across both OpenAI’s proprietary RFT API and open-weight GRPO training. Each iteration uncovered a new failure mode hiding behind the one we just fixed: the Abstention Trap (asymmetric penalties that make silence optimal), the Partial Credit Cliff (elegant math with catastrophic gradient signal), the Stratum Blind Spot (aggregate reward rising while subcategories collapse), the Entropy Death Spiral (mode collapse invisible behind an API), and the Reasoning Ceiling (performance walls that no reward engineering can break without chain-of-thought).
We distill these failures into a Reward Stress Testing framework: five concrete diagnostic checks practitioners can run before and during any RL fine-tuning job. Your reward function has bugs. This talk shows you how to find them before the optimizer does.
Osnat Haj-Yahia is a Staff AI Scientist with 7 years in data science and over 15 years in the tech industry. She holds a BS in Computer Science and pursued graduate studies in Neurobiology — a detour that deepened her thinking about learning systems before she returned full-time to the AI field. Her current work focuses on reinforcement learning fine-tuning , building and developing LLMs and AI agents to help businesses around the world prosper.
Keynote session: Hadas Grossmon Ella
Break
Lightning talks session
Roundtable closing
Talk by Hila Paz
Talk by Dr. Moran Mizrahi
Closing remarks
End
Reception
Opening remarks by WiDS TLV ambassadors
Dr. Mor Geva , Tel Aviv University: “MRI for Large Language Models: Mechanistic Interpretability from Neurons to Attention Heads”
Panel: “Pioneering Progress: a strategic look at the GenAI revolution and the new role of data scientists“ Shani Gershtein, Melingo | Mirit Elyada Bar, Intuit | Dr. Asi Messica, Lightricks Moderated by Nitzan Gado, Intuit
Poster pitches
Break
Lightning talks session
Lunch & poster session
Roundtable session & poster session
Roundtable closing
Shunit Agmon, Technion: “Bridging the Gender Gap in Clinical AI: Temporal Adaptation with TeDi-BERT”
Shaked Naor Hoffmann, Apartment List: “Building Generative AI Agents for Production: Turning Ideas into Real-World Applications”
Closing remarks
The end