Linoy Cohen

Tailor-Made LLM Evaluation: How to Create Custom Evaluations for Your LLM
Linoy Cohen


Linoy Cohen is a Data Scientist at Intuit in the NLP team. As part of her job, she is responsible for creating automatic evaluations for LLM’s that provide an objective method to measure the capabilities of LLM’s based on specific custom criteria and needs.


In the ever-changing world of Generative AI, new LLMs are being released on a daily basis, and while there are standardized scoring approaches for evaluating them, they don’t always evaluate based on what is important to us. In this talk, we will go over the two main approaches to evaluate LLMs – Benchmarking and LLM-as-a-judge. We will discuss which one to choose and how to create custom evaluations that suit our own use cases. Lastly, we will go over a set of best practices on how to create the best possible evaluation that produces an objective and deterministic score.


8:45 Reception
9:30 Opening remarks by WiDS TLV ambassadors Noah Eyal Altman, Or Basson, and Nitzan Gado
9:45 Dr. Aya Soffer, IBM: "Putting Generative AI to Work: What Have We Learned So Far?"
10:15 Prof. Reut Tsarfaty, Bar-llan University: "Will Hebrew Speakers Be Able to Use Generative AI in Their Native Tongue?"
10:45 Break
11:00 Lightning talks
12:20 Lunch & poster session
13:20 Roundtable session & poster session
14:05 Roundtable closing
14:20 Break
14:30 Dr. Orna Amir & Hila Kantor, Google: "A User-Centric Framework for Quantifying Notification Harm"
14:50 Naomi Ken Korem, Lightricks: "Mastering the Art of Generative Models: Training and Controlling Text-to-Video Models"
15:10 Dr. Yael Mathov, Intuit: "Surviving the AI-pocalypse: Your Guide to LLM Security"
15:30 Closing remarks
15:40 The end

WiDS TLV important update

Dear WiDS TLV attendees,

In light of recent developments, we regret to inform you that the WIDS TLV 2024 event scheduled for tomorrow has been postponed to June 3rd, 2024. We apologize for this last-minute change and look forward to seeing all of you on June 3rd.