Here is a simple algorithm for nuclear norm regularized problems, building upon on the recent approximate SDP solver of Hazan. This is a first-order method that only needs to compute one approximate eigenvector in each iteration. This algorithm is a natural instance of the generalized Frank-Wolfe algorithm (conditional gradient) which I have analyzed in more detail in my PhD thesis, or better see the more recent short overview article here.
Optimization problems with a nuclear norm regularization, such as e.g. low norm matrix factorizations, have seen many applications recently. We propose a new approximation algorithm building upon the recent sparse approximate SDP solver of (Hazan '08). The experimental efficiency of our method is demonstrated on large matrix completion problems such as the Netflix dataset. The algorithm comes with strong convergence guarantees, and can be interpreted as a first theoretically justified variant of Simon-Funk-type SVD heuristics. The method is free of tuning parameters, and very easy to parallelize.
This algorithm can be used to find matrix factorizations for an arbitrary convex loss function, if you want to do this under an additional nuclear norm constraint (regularization). Problems fitting into this framework are known under various names as for example robust PCA, kernel PCA, variants of sparse PCA, LDA, CCA, Locality Preserving Projections, and Spectral Clustering, see e.g. the overview by F. de la Torre. A nice side-effect of our approach is that by convexity we do not have problems of local minima, and obtain a guaranteed approximate solution which is simultaneously of low rank. Paper and talk slides are attached below.
Implementation: The following short
octave/matlab code implements the algorithm. This solves any convex minimization problem over (rectangular) matrices
X of nuclear norm at most
X = zeros(n,m);
for k = 0:numIt
[u,s,v] = svds(-nablaF(X),1); % linear subproblem: is solved by top singular vector pair
stepSize = 2/(k+2); % or better: perform line-search
X = (1-stepSize)*X + stepSize* t * u*v';
Here the only thing that needs to be added by the user is the function
nablaF(X), which must return the gradient of your chosen objective function for a given matrix
X. Note that e.g. for matrix completion, the gradient is always a sparse matrix, so each subproblem can be solved efficiently (also, the subproblem does not have to be solved exactly for the convergence guarantee to hold). The iterate
X can be represented either in dense form (like in the example here), or more space efficient in factorized form (as a rank-
k factorization after
k steps). For an overview over more algorithm variants, and also other regularizations and matrix factorizations, see the short overview article here.
java implementation for large scale matrix completion:
The .java file below contains the algorithm we used for the experiments in the ICML 2010 paper, using the power-method as the internal eigenvector procedure. The code is not yet completely self-contained, as the input data (the known/observed entries of the matrix which we want to complete) come in different formats for different datasets. The algorithm just assumes that all known entries are stored in a sequential list already. The algorithm never stores a full matrix, and therefore is able to scale to e.g. the size of the Netflix dataset.
Also attached below, you find my 10 slides of an earlier short seminar talk on the same topic:
Approximate SDP solvers, Matrix Factorizations, the Netflix Prize, and PageRank [ Mittags-Seminar Talk Abstract ]