**CODE**

% Generate some random data points.

x = randn(1,1000);

y = randn(1,1000);

% Fit a line that minimizes the squared error between the y-coordinate of the data

% and the y-coordinate of the fitted line. Bootstrap to see the variability of the fitted line.

paramsA = bootstrp(20,@(a,b) polyfit(a,b,1),x',y');

% Fit a line that minimizes the sum of the squared distances between the data points

% and the line. Bootstrap to see the variability of the fitted line.

paramsB = bootstrp(20,@fitline2derror,x',y');

% Visualize the results.

figure(999); hold on;

scatter(x,y,'k.');

ax = axis;

for p=1:size(paramsA,1)

h1 = plot(ax(1:2),polyval(paramsA(p,:),ax(1:2)),'r-');

h2 = plot(ax(1:2),polyval(paramsB(p,:),ax(1:2)),'b-');

end

axis(ax);

xlabel('x');

ylabel('y');

title('Red minimizes squared error on y; blue minimizes squared error on both x and y');

% Now, let's repeat for a different dataset.

temp = randnmulti(1000,[],[1 .5; .5 1]);

x = temp(:,1)';

y = temp(:,2)';

**OBSERVATIONS**

First example: When minimizing error on y, the fitted lines tend to be horizontal. This is because the best that the model can do is to basically predict the mean y-value regardless of what the x-value is. When minimizing error on both x and y, all lines are basically equally bad, giving rise to wildly different line fits across different bootstraps.

Second example: In this example, minimizing error on y produces lines that are closer to horizontal than the lines produced by minimizing error on both x and y. Notice that the lines produced by minimizing error on both x and y are aligned with the intrinsic axes of the Gaussian cloud.

## No comments:

## Post a Comment