赞
踩
Improve a Discriminant Analysis
Classifier
Deal with Singular Data
Discriminant analysis needs data sufficient to fit Gaussian
models with invertible covariance matrices. If your data is not
sufficient to fit such a model uniquely, fitcdiscr
fails. This section shows methods for handling failures.
Tip To obtain a
discriminant analysis classifier without failure, set the
DiscrimType name-value pair to
'pseudoLinear' or 'pseudoQuadratic' in
fitcdiscr.
"Pseudo" discriminants never fail, because they use the
pseudoinverse of the covariance matrix Σk (see
pinv).
Example:
Singular Covariance
Matrix.When the covariance
matrix of the fitted classifier is singular, fitcdiscr
can fail:
load popcorn
X = popcorn(:,[1 2]);
X(:,3) = 0; % a zero-variance column
Y = popcorn(:,3);
ppcrn = fitcdiscr(X,Y);
Error using ClassificationDiscriminant (line 635)
Predictor x3 has zero variance. Either exclude this predictor or set 'discrimType' to
'pseudoLinear' or 'diagLinear'.
Error in classreg.learning.FitTemplate/fit (line 243)
obj = this.MakeFitObject(X,Y,W,this.ModelParameters,fitArgs{:});
Error in fitcdiscr (line 296)
this = fit(temp,X,Y);
To proceed with linear discriminant analysis, use a
pseudoLinear or diagLinear discriminant
type:
ppcrn = fitcdiscr(X,Y,...
'discrimType','pseudoLinear');
meanpredict = predict(ppcrn,mean(X))
meanpredict =
3.5000
Choose a Discriminant Type
There are six types of discriminant analysis classifiers: linear
and quadratic, with diagonal and pseudo variants of
each type.
Tip To see if your
covariance matrix is singular, set discrimType to
'linear' or 'quadratic'. If the matrix is
singular, the fitcdiscr method fails for
'quadratic', and the Gamma property is
nonzero for 'linear'.
To obtain a quadratic classifier even when your covariance
matrix is singular, set DiscrimType to
'pseudoQuadratic' or 'diagQuadratic'.
obj = fitcdiscr(X,Y,'DiscrimType','pseudoQuadratic') % or 'diagQuadratic'
Choose a classifier type by setting the discrimType
name-value pair to one of:
'linear' (default) — Estimate one covariance matrix
for all classes.
'quadratic' — Estimate one covariance matrix for
each class.
'diagLinear' — Use the diagonal of the
'linear' covariance matrix, and use its pseudoinverse
if necessary.
'diagQuadratic' — Use the diagonals of the
'quadratic' covariance matrices, and use their
pseudoinverses if necessary.
'pseudoLinear' — Use the pseudoinverse of the
'linear' covariance matrix if necessary.
'pseudoQuadratic' — Use the pseudoinverses of the
'quadratic' covariance matrices if necessary.
fitcdiscr
can fail for the 'linear' and 'quadratic'
classifiers. When it fails, it returns an explanation, as shown in
Deal with Singular Data.
fitcdiscr
always succeeds with the diagonal and pseudo variants. For
information about pseudoinverses, see pinv.
You can set the discriminant type using dot notation after
constructing a classifier:
obj.DiscrimType = 'discrimType'
You can change between linear types or between quadratic types,
but cannot change between a linear and a quadratic type.
Examine the Resubstitution Error
and Confusion Matrix
The resubstitution error is the difference between the
response training data and the predictions the classifier makes of
the response based on the input training data. If the
resubstitution error is high, you cannot expect the predictions of
the classifier to be good. However, having low resubstitution error
does not guarantee good predictions for new data. Resubstitution
error is often an overly optimistic estimate of the predictive
error on new data.
The confusion matrix shows how many errors, and which
types, arise in resubstitution. When there are K
classes, the confusion matrix R is a
K-by-K matrix with
R(i,j) = the number of
observations of class i that the classifier predicts
to be of class j.
Example:
Resubstitution Error of a Discriminant Analysis
Classifier.Examine the
resubstitution error of the default discriminant analysis
classifier for the Fisher iris data:
load fisheriris
obj = fitcdiscr(meas,species);
resuberror = resubLoss(obj)
resuberror =
0.0200
The resubstitution error is very low, meaning obj
classifies nearly all the Fisher iris data correctly. The total
number of misclassifications is:
resuberror * obj.NumObservations
ans =
3.0000
To see the details of the three misclassifications, examine the
confusion matrix:
R = confusionmat(obj.Y,resubPredict(obj))
R =
50 0 0
0 48 2
0 1 49
obj.ClassNames
ans =
'setosa'
'versicolor'
'virginica'
R(1,:) = [50 0 0] means obj classifies
all 50 setosa irises correctly.
R(2,:) = [0 48 2] means obj classifies
48 versicolor irises correctly, and misclassifies two versicolor
irises as virginica.
R(3,:) = [0 1 49] means obj classifies
49 virginica irises correctly, and misclassifies one virginica iris
as versicolor.
Cross Validation
Typically, discriminant analysis classifiers are robust and do
not exhibit overtraining when the number of predictors is much less
than the number of observations. Nevertheless, it is good practice
to cross validate your classifier to ensure its stability.
Cross
Validating a Discriminant Analysis
Classifier
This example shows how to perform five-fold cross validation of
a quadratic discriminant analysis classifier.
Load the sample data.
load fisheriris
Create a quadratic discriminant analysis classifier for the
data.
quadisc = fitcdiscr(meas,species,'DiscrimType','quadratic');
Find the resubstitution error of the classifier.
qerror = resubLoss(quadisc)
qerror =
0.0200
The classifier does an excellent job. Nevertheless,
resubstitution error can be an optimistic estimate of the error
when classifying new data. So proceed to cross validation.
Create a cross-validation model.
cvmodel = crossval(quadisc,'kfold',5);
Find the cross-validation loss for the model, meaning the error
of the out-of-fold observations.
cverror = kfoldLoss(cvmodel)
cverror =
0.0200
The cross-validated loss is as low as the original
resubstitution loss. Therefore, you can have confidence that the
classifier is reasonably accurate.
Change Costs and Priors
Sometimes you want to avoid certain misclassification errors
more than others. For example, it might be better to have
oversensitive cancer detection instead of undersensitive cancer
detection. Oversensitive detection gives more false positives
(unnecessary testing or treatment). Undersensitive detection gives
more false negatives (preventable illnesses or deaths). The
consequences of underdetection can be high. Therefore, you might
want to set costs to reflect the consequences.
Similarly, the training data Y can have a
distribution of classes that does not represent their true
frequency. If you have a better estimate of the true frequency, you
can include this knowledge in the classification Prior
property.
Example:
Setting Custom Misclassification
Costs.Consider the Fisher
iris data. Suppose that the cost of classifying a versicolor iris
as virginica is 10 times as large as making any other
classification error. Create a classifier from the data, then
incorporate this cost and then view the resulting classifier.
Load the Fisher iris data and create a default (linear)
classifier as in Example: Resubstitution Error of a Discriminant Analysis
Classifier:
load fisheriris
obj = fitcdiscr(meas,species);
resuberror = resubLoss(obj)
resuberror =
0.0200
R = confusionmat(obj.Y,resubPredict(obj))
R =
50 0 0
0 48 2
0 1 49
obj.ClassNames
ans =
'setosa'
'versicolor'
'virginica'
R(2,:) = [0 48 2] means obj classifies
48 versicolor irises correctly, and misclassifies two versicolor
irises as virginica.
Change the cost matrix to make fewer mistakes in classifying
versicolor irises as virginica:
obj.Cost(2,3) = 10;
R2 = confusionmat(obj.Y,resubPredict(obj))
R2 =
50 0 0
0 50 0
0 7 43
obj now classifies all versicolor irises correctly,
at the expense of increasing the number of misclassifications of
virginica irises from 1 to 7.
Example:
Setting Alternative
Priors.Consider the Fisher
iris data. There are 50 irises of each kind in the data. Suppose
that, in a particular region, you have historical data that shows
virginica are five times as prevalent as the other kinds. Create a
classifier that incorporates this information.
Load the Fisher iris data and make a default (linear) classifier
as in Example: Resubstitution Error of a Discriminant Analysis
Classifier:
load fisheriris
obj = fitcdiscr(meas,species);
resuberror = resubLoss(obj)
resuberror =
0.0200
R = confusionmat(obj.Y,resubPredict(obj))
R =
50 0 0
0 48 2
0 1 49
obj.ClassNames
ans =
'setosa'
'versicolor'
'virginica'
R(3,:) = [0 1 49] means obj classifies
49 virginica irises correctly, and misclassifies one virginica iris
as versicolor.
Change the prior to match your historical data, and examine the
confusion matrix of the new classifier:
obj.Prior = [1 1 5];
R2 = confusionmat(obj.Y,resubPredict(obj))
R2 =
50 0 0
0 46 4
0 0 50
The new classifier classifies all virginica irises correctly, at
the expense of increasing the number of misclassifications of
versicolor irises from 2 to 4.
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。