Danit Shifman Abukasis

Bar-Ilan University
Optimizing Web Crawling in Resource-Constrained Environments: A Two-Phase Machine Learning and Operations Research Approach
Danit Abukasis

Bio

Danit is a Ph.D. candidate at the Faculty of Engineering, Bar Ilan University. Her research is about combining Machine Learning with Operation Research and focuses on classification problems with resource constraints.

Abstract

This research presents a novel two-phase approach aimed at optimizing website crawling for a financial software company, addressing the challenge of combining resource constraints and classification costs. The first phase utilizes a machine learning model to predict the probability of an efficient crawl. In the second phase, the probabilities of the previous phase are integrated into an optimization model. This model is designed to allocate crawling resources in a way that maximizes efficiency while taking into account both global and local constraints. The global constraint refers to a limitation on the total number of crawls, and the local constraints are the limitations to specific dataset groups, such as a limitation to every website provider. We present a mathematical formulation and solution of the problem, leveraging a cost matrix and constraint equations to minimize classification costs while ensuring compliance with global and local restrictions, and prove that the problem has an integer solution. An experimental study demonstrates the effectiveness of our approach in optimizing daily crawls. The proposed method offers a scalable solution for handling the problem of web crawling in resource-constrained scenarios, providing a significant contribution to the fields of machine learning and operations research.

Agenda

8:45 Reception
9:30 Opening remarks by WiDS TLV ambassadors Noah Eyal Altman, Or Basson, and Nitzan Gado
9:45 Dr. Aya Soffer, IBM: "Putting Generative AI to Work: What Have We Learned So Far?"
10:15 Prof. Reut Tsarfaty, Bar-llan University: "Will Hebrew Speakers Be Able to Use Generative AI in Their Native Tongue?"
10:45 Poster Pitches
10:55 Break
11:10 Lightning talks
12:30 Lunch & poster session
13:30 Roundtable session & poster session
14:15 Roundtable closing
14:30 Break
14:40 Naomi Ken Korem, Lightricks: "Mastering the Art of Generative Models: Training and Controlling Text-to-Video Models"
15:00 Dr. Yael Mathov, Intuit: "Surviving the AI-pocalypse: Your Guide to LLM Security"
15:20 Closing remarks
15:30 The end