Regression normally attributes error to the dependent variable, but it is possible to fit regression models that attribute errors to both dependent and independent variables.
CODE
% Generate some random data points.
x = randn(1,1000);
y = randn(1,1000);
% Fit a line that minimizes the squared error between the y-coordinate of the data
% and the y-coordinate of the fitted line. Bootstrap to see the variability of the fitted line.
paramsA = bootstrp(20,@(a,b) polyfit(a,b,1),x',y');
% Fit a line that minimizes the sum of the squared distances between the data points
% and the line. Bootstrap to see the variability of the fitted line.
paramsB = bootstrp(20,@fitline2derror,x',y');
% Visualize the results.
figure(999); hold on;
scatter(x,y,'k.');
ax = axis;
for p=1:size(paramsA,1)
h1 = plot(ax(1:2),polyval(paramsA(p,:),ax(1:2)),'r-');
h2 = plot(ax(1:2),polyval(paramsB(p,:),ax(1:2)),'b-');
end
axis(ax);
xlabel('x');
ylabel('y');
title('Red minimizes squared error on y; blue minimizes squared error on both x and y');
% Now, let's repeat for a different dataset.
temp = randnmulti(1000,[],[1 .5; .5 1]);
x = temp(:,1)';
y = temp(:,2)';
OBSERVATIONS
First example: When minimizing error on y, the fitted lines tend to be horizontal. This is because the best that the model can do is to basically predict the mean y-value regardless of what the x-value is. When minimizing error on both x and y, all lines are basically equally bad, giving rise to wildly different line fits across different bootstraps.
Second example: In this example, minimizing error on y produces lines that are closer to horizontal than the lines produced by minimizing error on both x and y. Notice that the lines produced by minimizing error on both x and y are aligned with the intrinsic axes of the Gaussian cloud.
No comments:
Post a Comment