SVD
Singular value decomposition (SVD) decomposes a matrix X into three matrices U, S, and V such that:
X = U*S*V'
The columns of U are mutually orthogonal unit-length vectors (these are called the left singular vectors); the columns of V are mutually orthogonal unit-length vectors (these are called the right singular vectors); and S is a matrix with zeros everywhere except for non-negative values along the diagonal (these values are called the singular values).
% Let's see an example.
X = randn(10,3);
[U,S,V] = svd(X);
figure; setfigurepos([100 100 500 150]);
subplot(1,3,1); imagesc(U'*U); axis image square; colorbar; title('U^TU');
subplot(1,3,2); imagesc(V'*V); axis image square; colorbar; title('V^TV');
subplot(1,3,3); imagesc(S); axis image square; colorbar; title('S');
Notice that U'*U and V'*V are identity matrices. This makes sense because the sum of the squares of the coordinates of a unit-length vector equals one and because the dot product of orthogonal vectors equals zero.
Covariance matrices
Suppose X represents a set of linear regressors. That is, suppose the dimensions of X are m x n, representing n regressors defined in an m-dimensional space. Then, X'*X is an n x n covariance matrix, representing the covariance of each regressor with each other regressor. (Technically, this is not quite correct: true covariance involves subtracting off the mean of each regressor before calculating the pairwise dot-products and dividing the results by the number of elements in each regressor (m). Nevertheless, it is still useful to think of X'*X as a covariance matrix.)
SVD applied to covariance matrices
SVD provides a useful decomposition of covariance matrices:
X'*X = (U*S*V')'*(U*S*V') = V*S'*U' * U*S*V' = V*S.^2*V'
where S.^2 is a diagonal matrix consisting of zeros everywhere except for non-negative values along the diagonal (these values are the square of the values in the original S matrix).
X'*X is an n x n matrix that can be interpreted as applying a linear operation to n-dimensional space. For instance, suppose we have a vector v with dimensions n x 1. This vector corresponds to a specific point in n-dimensional space. If we calculate (X'*X)*v, we obtain a new vector corresponding to a new point in n-dimensional space.
The SVD decomposition of X'*X gives us insight into the nature of the linear operation. Specifically, the SVD decomposition tells us that the linear operation consists of three successive operations: a rotation, a scaling, and then an undoing of the original rotation:
- The initial rotation comes from multiplication with V'. This is because the columns of V form a complete orthonormal basis (each column is a unit-length vector and all columns are orthogonal to one another). Multiplication of V' and v projects the vector v onto the basis. Change of basis simply rotates the coordinate system.
- The scaling comes from multiplication with S.^2. Since S.^2 is a diagonal matrix, multiplication with S.^2 simply stretches the coordinate system along the axes of the coordinate system.
- The final rotation comes from multiplication with V. This multiplication undos the rotation that was performed by the initial multiplication with V'.
X = unitlength(zeromean(randnmulti(1000,[],[1 .6; .6 1]),1),1);
figure; setfigurepos([100 100 300 300]);
imagesc(X'*X,[0 1]);
axis image square;
colorbar;
title('Covariance matrix: X^TX');
% In this example there are 2 regressors defined in a 1000-dimensional space.
% The regressors are moderately correlated with one another, as can be seen
% in the relatively high off-diagonal term in the covariance matrix.
% Let's use SVD to interpret the operation performed by the covariance matrix.
[U,S,V] = svd(X);
angs = linspace(0,2*pi,100);
vectors = {};
circlevectors = {};
vectors{1} = eye(2);
circlevectors{1} = [cos(angs); sin(angs)];
vectors{2} = V'*vectors{1};
circlevectors{2} = V'*circlevectors{1};
vectors{3} = S'*S*vectors{2};
circlevectors{3} = S'*S*circlevectors{2};
vectors{4} = V*vectors{3};
circlevectors{4} = V*circlevectors{3};
% now perform plotting
figure; setfigurepos([100 100 600 200]);
colors = 'rg';
titles = {'Original space' '(1) Rotate' '(2) Scale' '(3) Un-rotate'};
for p=1:4
subplot(1,4,p);
axis([-1.8 1.8 -1.8 1.8]); axis square; axis off;
for q=1:2
drawarrow(zeros(1,2),vectors{p}(:,q)',[colors(q) '-'],[],5);
end
plot(circlevectors{p}(1,:),circlevectors{p}(2,:),'k-');
title(titles{p});
end
% The first plot shows a circle in black and equal-length vectors oriented
% along the coordinate axes in red and green. The second plot shows what
% happens after the initial rotation operation. The third plot shows what
% happens after the scaling operation. The fourth plot shows what happens
% after the final rotation operation. Notice that in the end result, the
% red and green vectors are no longer orthogonal and the circle is now an ellipse.
% Let's visualize the linear operation once more, but this time let's use
% the red and green vectors to represent the columns of the V matrix.
vectors = {};
vectors{1} = V;
vectors{2} = V'*vectors{1};
vectors{3} = S'*S*vectors{2};
vectors{4} = V*vectors{3};
% run the "now perform plotting" section above
% Compare the first plot to the last plot. Notice that red and green vectors
% have not changed their angle but only their magnitude --- this reflects the
% fact that the columns of the V matrix are eigenvectors of the covariance matrix.
% Multiplication by the covariance matrix scales the space along the directions
% of the eigenvectors, which has the consequence that only the magnitudes of
% the eigenvectors, and not their angles, are modified.