Sunday, October 23, 2011

Error in two dimensions

Regression normally attributes error to the dependent variable, but it is possible to fit regression models that attribute errors to both dependent and independent variables.

CODE

% Generate some random data points.
x = randn(1,1000);
y = randn(1,1000);

% Fit a line that minimizes the squared error between the y-coordinate of the data
% and the y-coordinate of the fitted line.  Bootstrap to see the variability of the fitted line.
paramsA = bootstrp(20,@(a,b) polyfit(a,b,1),x',y');

% Fit a line that minimizes the sum of the squared distances between the data points
% and the line.  Bootstrap to see the variability of the fitted line.
paramsB = bootstrp(20,@fitline2derror,x',y');

% Visualize the results.
figure(999); hold on;
scatter(x,y,'k.');
ax = axis;
for p=1:size(paramsA,1)
  h1 = plot(ax(1:2),polyval(paramsA(p,:),ax(1:2)),'r-');
  h2 = plot(ax(1:2),polyval(paramsB(p,:),ax(1:2)),'b-');
end
axis(ax);
xlabel('x');
ylabel('y');
title('Red minimizes squared error on y; blue minimizes squared error on both x and y');



% Now, let's repeat for a different dataset.
temp = randnmulti(1000,[],[1 .5; .5 1]);
x = temp(:,1)';
y = temp(:,2)';



OBSERVATIONS

First example: When minimizing error on y, the fitted lines tend to be horizontal. This is because the best that the model can do is to basically predict the mean y-value regardless of what the x-value is. When minimizing error on both x and y, all lines are basically equally bad, giving rise to wildly different line fits across different bootstraps.

Second example: In this example, minimizing error on y produces lines that are closer to horizontal than the lines produced by minimizing error on both x and y. Notice that the lines produced by minimizing error on both x and y are aligned with the intrinsic axes of the Gaussian cloud.

No comments:

Post a Comment