Slides of a short seminar talk again about Support Vector Machines:
Kernels, Margins, Coresets and The Fight of High- vs. Low-Dimensional Spaces
Finding a separating hyperplane between two point clouds is the essence of kernel methods, which have become a standard tool in machine learning for all kinds of classification and regression problems (and it can read your thoughts ;-)). It is usually assumed that the points live in an implicit high-dimensional space, and that in order to solve the problem we are only allowed black-box access to the scalar product (called the kernel) in that space.
It has been entirely assumed that the success of these methods comes from just this ability to use very high-dimensional mappings, without increasing computational cost.
Well, not entirely... One small village paper of indomitable Balcan, Blum and Vempala '06 still holds out against the Roman invaders, claiming that a kernel should instead be interpreted as a good mapping to a low-dimensional space.
The idea comes from the Johnson-Lindenstrauss lemma applied to our setting: If the two point clouds are separable by some margin, then also in a random projection to a very low-dimensional subspace, the points should still be separable by roughly the same margin.
We will try to support this idea by advertising coresets as the deterministic analogon of such low-dimensional projections.
Based on a true story:
M. Balcan, A. Blum and S. Vempala. Kernels as features: On kernels, margins, and low-dimensional mappings. Machine Learning (2006) vol. 65 (1) pp. 79-94
or the survey on the same topic:
A. Blum. Random Projection, Margins, Kernels, and Feature-Selection. Subspace, Latent Structure and Feature Selection, LNCS (2006) pp. 52-68