CODE
% Let's distill the distinction between accuracy and reliability
% down to its core and look at a very simple example.
figure(999); clf; hold on;
h1 = scatter(1,7,'ro');
h2 = scatter(1,4,'bo');
h3 = errorbar2(1,4,1,'v','b-');
axis([0 2 0 12]);
legend([h1 h2 h3],{'True model' 'Estimated model' 'Error bars'});
ylabel('Value');
set(gca,'XTick',[]);

% In this example, we have a single number indicated by the red dot,
% and we are trying to match this number with a model. Through some
% means we have estimated a specific model, and the prediction of the
% model is indicated by the blue dot. Moreover, through some means we
% have estimated error bars on the model's prediction, and this is
% indicated by the blue line.
% Now let's consider the accuracy and reliability of the estimated
% model. The accuracy of the model corresponds to how far the
% estimated model is away from the true model. The reliability
% of the model corresponds to how variable the estimated model is.
h4 = drawarrow([1.3 4.5],[1.03 4.52],'k-',[],10);
h5 = text(1.33,4.5,'Reliability');
h6 = plot([.95 .9 .9 .95],[7 7 4 4],'k-');
h7 = text(.88,5.5,'Accuracy','HorizontalAlignment','Right');

% Accuracy and reliability are not the same thing, although they do bear
% certain relationships to one another. For example, if reliability is
% low, then it is likely that accuracy is low. (Imagine that the error bar
% on a given model is very large. Then, we would expect that any given
% estimate of the model would be not well matched to the true model.)
% Conversely, if accuracy is high, then it is likely that reliability
% is also high. (If a model estimate predicts responses extremely
% well, then it is likely that the parameters of the model are well
% estimated.)
%
% However, an important case to keep in mind is that it is possible for a
% model to have high reliability but low accuracy. To see how this can
% occur, let's examine each possible configuration of accuracy and
% reliability.
% CASE 1: MODEL IS RELIABLE AND ACCURATE.
% In this case, there are enough data to obtain good estimates of
% the parameters of the model, and the model is a good description
% of the data. Let's see an example (quadratic model fitted to
% quadratic data).
x = rand(1,100)*14 - 8;
y = -x.^2 + 2*x + 4 + 6*randn(1,100);
rec = fitprfstatic([x.^2; x; ones(1,length(x))]',y',0,0,[],100,[],[],[],@calccod);
figure(998); clf; hold on;
h1 = scatter(x,y,'k.');
ax = axis;
xx = linspace(ax(1),ax(2),100);
X = [xx.^2; xx; ones(1,length(xx))]';
modelfits = [];
for p=1:size(rec.params,1)
modelfits(p,:) = X*rec.params(p,:)';
end
mn = median(modelfits,1);
se = stdquartile(modelfits,1,1);
h2 = errorbar3(xx,mn,se,'v',[.8 .8 1]);
h3 = plot(xx,mn,'b-');
h4 = plot(xx,-xx.^2 + 2*xx + 4,'r-');
uistack(h1,'top');
xlabel('x'); ylabel('y');
legend([h1 h4 h3 h2],{'Data' 'True model' 'Estimated model' 'Error bars'});
title('Model is reliable and accurate');

% CASE 2: MODEL IS RELIABLE BUT INACCURATE.
% In this case, there are enough data to obtain good estimates of
% the parameters of the model, but the model is a bad description
% of the data. Let's see an example (linear model fitted to
% quadratic data).
x = rand(1,100)*10 - 5;
y = x.^2 - 3*x + 4 + 1*randn(1,100);
rec = fitprfstatic([x; ones(1,length(x))]',y',0,0,[],100,[],[],[],@calccod);
figure(997); clf; hold on;
h1 = scatter(x,y,'k.');
ax = axis;
xx = linspace(ax(1),ax(2),100);
X = [xx; ones(1,length(xx))]';
modelfits = [];
for p=1:size(rec.params,1)
modelfits(p,:) = X*rec.params(p,:)';
end
mn = median(modelfits,1);
se = stdquartile(modelfits,1,1);
h2 = errorbar3(xx,mn,se,'v',[.8 .8 1]);
h3 = plot(xx,mn,'b-');
h4 = plot(xx,xx.^2 - 3*xx + 4,'r-');
uistack(h1,'top');
xlabel('x'); ylabel('y');
legend([h1 h4 h3 h2],{'Data' 'True model' 'Estimated model' 'Error bars'});
title('Model is reliable but inaccurate');

% CASE 3: MODEL IS UNRELIABLE BUT ACCURATE.
% This is not a likely situation. Suppose there are insufficient data to
% obtain good estimates of the parameters of a model. This implies that
% the parameters would fluctuate widely from dataset to dataset, which in
% turn implies that the predictions of the model would also fluctuate widely
% from dataset to dataset. Thus, for any given dataset, it would be unlikely
% that the predictions of the estimated model would be well matched to the data.
% CASE 4. MODEL IS UNRELIABLE AND INACCURATE.
% In this case, there are insufficient data to obtain good estimates of
% the parameters of the model, and this supplies a plausible explanation
% for why the model does not describe the data well. (Of course, it could
% be the case that even with sufficient data, the estimated model would
% still be a poor description of the data; see case 2 above.) Let's see
% an example of an unreliable and inaccurate model (Gaussian model
% fitted to Gaussian data, but only a few noisy data points are available).
x = linspace(1,100,20);
y = evalgaussian1d([40 10 10 2],x) + 10*randn(1,20);
model = {[30 20 5 0] [-Inf 0 -Inf -Inf; Inf Inf Inf Inf] @(pp,xx) evalgaussian1d(pp,xx)};
rec = fitprfstatic(x',y',model,[],[],100,[],[],[],@calccod);
figure(996); clf; hold on;
h1 = scatter(x,y,'k.');
ax = axis;
xx = linspace(ax(1),ax(2),100);
modelfits = [];
for p=1:size(rec.params,1)
modelfits(p,:) = evalgaussian1d(rec.params(p,:),xx);
end
mn = median(modelfits,1);
se = stdquartile(modelfits,1,1);
h2 = errorbar3(xx,mn,se,'v',[.8 .8 1]);
h3 = plot(xx,mn,'b-');
h4 = plot(xx,evalgaussian1d([40 10 10 2],xx),'r-');
uistack(h1,'top');
xlabel('x'); ylabel('y');
legend([h1 h4 h3 h2],{'Data' 'True model' 'Estimated model' 'Error bars'});
title('Model is unreliable and inaccurate');

No comments:
Post a Comment