**SVD**

Singular value decomposition (SVD) decomposes a matrix X into three matrices U, S, and V such that:

X = U*S*V'

The columns of U are mutually orthogonal unit-length vectors (these are called the left singular vectors); the columns of V are mutually orthogonal unit-length vectors (these are called the right singular vectors); and S is a matrix with zeros everywhere except for non-negative values along the diagonal (these values are called the singular values).

% Let's see an example.

X = randn(10,3);

[U,S,V] = svd(X);

figure; setfigurepos([100 100 500 150]);

subplot(1,3,1); imagesc(U'*U); axis image square; colorbar; title('U^TU');

subplot(1,3,2); imagesc(V'*V); axis image square; colorbar; title('V^TV');

subplot(1,3,3); imagesc(S); axis image square; colorbar; title('S');

Notice that U'*U and V'*V are identity matrices. This makes sense because the sum of the squares of the coordinates of a unit-length vector equals one and because the dot product of orthogonal vectors equals zero.

**Covariance matrices**

Suppose X represents a set of linear regressors. That is, suppose the dimensions of X are m x n, representing n regressors defined in an m-dimensional space. Then, X'*X is an n x n covariance matrix, representing the covariance of each regressor with each other regressor. (Technically, this is not quite correct: true covariance involves subtracting off the mean of each regressor before calculating the pairwise dot-products and dividing the results by the number of elements in each regressor (m). Nevertheless, it is still useful to think of X'*X as a covariance matrix.)

**SVD applied to covariance matrices**

SVD provides a useful decomposition of covariance matrices:

X'*X = (U*S*V')'*(U*S*V') = V*S'*U' * U*S*V' = V*S.^2*V'

where S.^2 is a diagonal matrix consisting of zeros everywhere except for non-negative values along the diagonal (these values are the square of the values in the original S matrix).

X'*X is an n x n matrix that can be interpreted as applying a linear operation to n-dimensional space. For instance, suppose we have a vector v with dimensions n x 1. This vector corresponds to a specific point in n-dimensional space. If we calculate (X'*X)*v, we obtain a new vector corresponding to a new point in n-dimensional space.

The SVD decomposition of X'*X gives us insight into the nature of the linear operation. Specifically, the SVD decomposition tells us that the linear operation consists of three successive operations: a rotation, a scaling, and then an undoing of the original rotation:

- The initial rotation comes from multiplication with V'. This is because the columns of V form a complete orthonormal basis (each column is a unit-length vector and all columns are orthogonal to one another). Multiplication of V' and v projects the vector v onto the basis. Change of basis simply rotates the coordinate system.
- The scaling comes from multiplication with S.^2. Since S.^2 is a diagonal matrix, multiplication with S.^2 simply stretches the coordinate system along the axes of the coordinate system.
- The final rotation comes from multiplication with V. This multiplication undos the rotation that was performed by the initial multiplication with V'.

X = unitlength(zeromean(randnmulti(1000,[],[1 .6; .6 1]),1),1);

figure; setfigurepos([100 100 300 300]);

imagesc(X'*X,[0 1]);

axis image square;

colorbar;

title('Covariance matrix: X^TX');

% In this example there are 2 regressors defined in a 1000-dimensional space.

% The regressors are moderately correlated with one another, as can be seen

% in the relatively high off-diagonal term in the covariance matrix.

% Let's use SVD to interpret the operation performed by the covariance matrix.

[U,S,V] = svd(X);

angs = linspace(0,2*pi,100);

vectors = {};

circlevectors = {};

vectors{1} = eye(2);

circlevectors{1} = [cos(angs); sin(angs)];

vectors{2} = V'*vectors{1};

circlevectors{2} = V'*circlevectors{1};

vectors{3} = S'*S*vectors{2};

circlevectors{3} = S'*S*circlevectors{2};

vectors{4} = V*vectors{3};

circlevectors{4} = V*circlevectors{3};

% now perform plotting

figure; setfigurepos([100 100 600 200]);

colors = 'rg';

titles = {'Original space' '(1) Rotate' '(2) Scale' '(3) Un-rotate'};

for p=1:4

subplot(1,4,p);

axis([-1.8 1.8 -1.8 1.8]); axis square; axis off;

for q=1:2

drawarrow(zeros(1,2),vectors{p}(:,q)',[colors(q) '-'],[],5);

end

plot(circlevectors{p}(1,:),circlevectors{p}(2,:),'k-');

title(titles{p});

end

% The first plot shows a circle in black and equal-length vectors oriented

% along the coordinate axes in red and green. The second plot shows what

% happens after the initial rotation operation. The third plot shows what

% happens after the scaling operation. The fourth plot shows what happens

% after the final rotation operation. Notice that in the end result, the

% red and green vectors are no longer orthogonal and the circle is now an ellipse.

% Let's visualize the linear operation once more, but this time let's use

% the red and green vectors to represent the columns of the V matrix.

vectors = {};

vectors{1} = V;

vectors{2} = V'*vectors{1};

vectors{3} = S'*S*vectors{2};

vectors{4} = V*vectors{3};

% run the "now perform plotting" section above

% Compare the first plot to the last plot. Notice that red and green vectors

% have not changed their angle but only their magnitude --- this reflects the

% fact that the columns of the V matrix are eigenvectors of the covariance matrix.

% Multiplication by the covariance matrix scales the space along the directions

% of the eigenvectors, which has the consequence that only the magnitudes of

% the eigenvectors, and not their angles, are modified.