Tutorial at the International Conference on Machine Learning (ICML)
21. June 2014, 8:30AM, Beijing, China
for Learning with Big Data - Zaid Harchaoui & Martin Jaggi
(the page here mirrors of the contents of the above linked site)
Bibliography (PDF) and demo code (Ocave/Matlab) see below.
Additionally, here is a link to the videos, papers and slides from our past NIPS '13 workshop.
Abstract: We provide a unified overview of several families of algorithms proposed in different settings: Frank-Wolfe (a.k.a. Conditional Gradient) algorithms, greedy optimization methods, and related extensions. Frank-Wolfe methods have been successfully applied to a wide range of large-scale learning and signal processing applications, such as matrix factorization, multi-task learning, image denoising, and structured prediction. On the other hand, greedy optimization algorithms, which underlie several versions of boosting, appear in structured variable selection, metric learning, and training of sum-product networks.
All these algorithms have in common that they rely on the atomic decomposition of the variable of interest, that is expanding it as a linear combination of the elements of a dictionary. In this tutorial, we showcase these algorithms in a unified framework, and present simple proofs of convergence rates and illustrate their underlying assumptions.
We show how these families of algorithms relate to each other, illustrate several successful applications, and highlight current challenges.