Wednesday, December 28, 2011

SVD and covariance matrices

This post describes singular value decomposition (SVD) and how it applies to covariance matrices.  As we will see in later posts, SVD and covariance matrices are central to understanding principal components analysis (PCA), linear regression, and multivariate Gaussian probability distributions.


Singular value decomposition (SVD) decomposes a matrix X into three matrices U, S, and V such that:
  X = U*S*V'
The columns of U are mutually orthogonal unit-length vectors (these are called the left singular vectors); the columns of V are mutually orthogonal unit-length vectors (these are called the right singular vectors); and S is a matrix with zeros everywhere except for non-negative values along the diagonal (these values are called the singular values).

% Let's see an example.
X = randn(10,3);
[U,S,V] = svd(X);
figure; setfigurepos([100 100 500 150]);
subplot(1,3,1); imagesc(U'*U); axis image square; colorbar; title('U^TU');
subplot(1,3,2); imagesc(V'*V); axis image square; colorbar; title('V^TV');
subplot(1,3,3); imagesc(S); axis image square; colorbar; title('S');

Notice that U'*U and V'*V are identity matrices.  This makes sense because the sum of the squares of the coordinates of a unit-length vector equals one and because the dot product of orthogonal vectors equals zero.

Covariance matrices

Suppose X represents a set of linear regressors.  That is, suppose the dimensions of X are m x n, representing n regressors defined in an m-dimensional space.  Then, X'*X is an n x n covariance matrix, representing the covariance of each regressor with each other regressor.  (Technically, this is not quite correct: true covariance involves subtracting off the mean of each regressor before calculating the pairwise dot-products and dividing the results by the number of elements in each regressor (m).  Nevertheless, it is still useful to think of X'*X as a covariance matrix.)

SVD applied to covariance matrices

SVD provides a useful decomposition of covariance matrices:
  X'*X = (U*S*V')'*(U*S*V') = V*S'*U' * U*S*V' = V*S.^2*V'
where S.^2 is a diagonal matrix consisting of zeros everywhere except for non-negative values along the diagonal (these values are the square of the values in the original S matrix).

X'*X is an n x n matrix that can be interpreted as applying a linear operation to n-dimensional space.  For instance, suppose we have a vector v with dimensions n x 1.  This vector corresponds to a specific point in n-dimensional space.  If we calculate (X'*X)*v, we obtain a new vector corresponding to a new point in n-dimensional space.

The SVD decomposition of X'*X gives us insight into the nature of the linear operation.  Specifically, the SVD decomposition tells us that the linear operation consists of three successive operations: a rotation, a scaling, and then an undoing of the original rotation:
  1. The initial rotation comes from multiplication with V'.  This is because the columns of V form a complete orthonormal basis (each column is a unit-length vector and all columns are orthogonal to one another).  Multiplication of V' and v projects the vector v onto the basis.  Change of basis simply rotates the coordinate system.
  2. The scaling comes from multiplication with S.^2.  Since S.^2 is a diagonal matrix, multiplication with S.^2 simply stretches the coordinate system along the axes of the coordinate system.
  3. The final rotation comes from multiplication with V.  This multiplication undos the rotation that was performed by the initial multiplication with V'.
% Let's see an example.
X = unitlength(zeromean(randnmulti(1000,[],[1 .6; .6 1]),1),1);
figure; setfigurepos([100 100 300 300]);
imagesc(X'*X,[0 1]);
axis image square;
title('Covariance matrix: X^TX');

% In this example there are 2 regressors defined in a 1000-dimensional space.
% The regressors are moderately correlated with one another, as can be seen
% in the relatively high off-diagonal term in the covariance matrix.

% Let's use SVD to interpret the operation performed by the covariance matrix.
[U,S,V] = svd(X);
angs = linspace(0,2*pi,100);
vectors = {};
circlevectors = {};
vectors{1} = eye(2);
circlevectors{1} = [cos(angs); sin(angs)];
vectors{2} = V'*vectors{1};
circlevectors{2} = V'*circlevectors{1};
vectors{3} = S'*S*vectors{2};
circlevectors{3} = S'*S*circlevectors{2};
vectors{4} = V*vectors{3};
circlevectors{4} = V*circlevectors{3};
  % now perform plotting
figure; setfigurepos([100 100 600 200]);
colors = 'rg';
titles = {'Original space' '(1) Rotate' '(2) Scale' '(3) Un-rotate'};
for p=1:4
  axis([-1.8 1.8 -1.8 1.8]); axis square; axis off;
  for q=1:2
    drawarrow(zeros(1,2),vectors{p}(:,q)',[colors(q) '-'],[],5);

% The first plot shows a circle in black and equal-length vectors oriented
% along the coordinate axes in red and green.  The second plot shows what
% happens after the initial rotation operation.  The third plot shows what
% happens after the scaling operation.  The fourth plot shows what happens
% after the final rotation operation.  Notice that in the end result, the
% red and green vectors are no longer orthogonal and the circle is now an ellipse.

% Let's visualize the linear operation once more, but this time let's use
% the red and green vectors to represent the columns of the V matrix.
vectors = {};
vectors{1} = V;
vectors{2} = V'*vectors{1};
vectors{3} = S'*S*vectors{2};
vectors{4} = V*vectors{3};
% run the "now perform plotting" section above

% Compare the first plot to the last plot.  Notice that red and green vectors
% have not changed their angle but only their magnitude --- this reflects the
% fact that the columns of the V matrix are eigenvectors of the covariance matrix.
% Multiplication by the covariance matrix scales the space along the directions
% of the eigenvectors, which has the consequence that only the magnitudes of
% the eigenvectors, and not their angles, are modified.


  1. I would never like to fail out any chance to read out your listings.
    all vectors

  2. good, keep posting more useful things! thanks