This episode is an interview with Honorable Mention Outstanding Paper Award winners Ibrahim Jubran and Alaa Maalouf from Haifa University, discussing highlights from their paper, "Fast and Accurate Least-Mean-Squares Solvers" from NeurIPS 2019 conference.
As the award committee states: "Least Mean-Square solvers operate at the core of many ML algorithms, from linear and Lasso regression to singular value decomposition and Elastic net. The paper shows how to reduce their computational complexity by one or two orders of magnitude, with no precision loss and improved numerical stability. The approach relies on the Caratheodory theorem, establishing that a coreset (set of d2 + 1 points in dimension d) is sufficient to characterize all n points in a convex hull. The novelty lies in the divide-and-conquer algorithm proposed to extract a coreset with affordable complexity (O(nd + d5 log n), granted that d << n). Reviewers emphasize the importance of the approach, for practitioners as the method can be easily implemented to improve existing algorithms, and for extension to other algorithms as the recursive partitioning principle of the approach lends itself to generalization."
Ibrahim Jubran received his B.Sc. (cum laude, Etgar program) and M.Sc. (summa cum laude) in Computer Science at the University of Haifa, Israel, in 2014 and 2018 respectively, and is now a Ph.D. candidate under the supervision of Dr. Dan Feldman. He has developed approximation algorithms and data reduction schemes (known as coresets) for pose-estimation and machine learning problems. His research interests include Computer Vision, Machine Learning, and Compression of Big Data.
Alaa Maalouf received his B.Sc. and M.Sc. in Computer Science from the University of Haifa and is now starting his Ph.D. under the supervision of Dr. Dan Feldman. His main research interests focus on Machine/Deep Learning, Computational Geometry and Coresets (data summarization) for Big Data.
Paper At A Glance:
Least-mean squares (LMS) solvers such as Linear / Ridge / Lasso-Regression, SVD and Elastic-Net not only solve fundamental machine learning problems, but are also the building blocks in a variety of other methods, such as decision trees and matrix factorizations. We suggest an algorithm that gets a finite set of n d-dimensional real vectors and returns a weighted subset of d+1 vectors whose sum is \emph{exactly} the same. The proof in Caratheodory's Theorem (1907) computes such a subset in O(n2 d2) time and thus not used in practice. Our algorithm computes this subset in O(nd) time, using O(log n) calls to Caratheodory's construction on small but "smart" subsets. This is based on a novel paradigm of fusion between different data summarization techniques, known as sketches and coresets. As an example application, we show how it can be used to boost the performance of existing LMS solvers, such as those in scikit-learn library, up to x100. Generalization for streaming and distributed (big) data is trivial. Extensive experimental results and complete open source code are also provided.
Poster: http://papers.nips.cc/paper/9040-fast-and-accurate-least-mean-squares-solvers
Paper: https://arxiv.org/abs/1906.04705